Repository: incubator-hawq-docs Updated Branches: refs/heads/master 458ef8ea4 -> 40d9a0b6d
HAWQ-1435 document new pxf jdbc plug-in (closes #124) Project: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/commit/40d9a0b6 Tree: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/tree/40d9a0b6 Diff: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/diff/40d9a0b6 Branch: refs/heads/master Commit: 40d9a0b6d27ab37d831827e9e845f3d5d8b0040b Parents: 458ef8e Author: Lisa Owen <[email protected]> Authored: Wed Jun 28 09:43:56 2017 -0700 Committer: David Yozie <[email protected]> Committed: Wed Jun 28 09:43:56 2017 -0700 ---------------------------------------------------------------------- .../source/subnavs/apache-hawq-nav.erb | 1 + .../pxf/HawqExtensionFrameworkPXF.html.md.erb | 4 + markdown/pxf/InstallPXFPlugins.html.md.erb | 4 +- markdown/pxf/JdbcPXF.html.md.erb | 211 +++++++++++++++++++ 4 files changed, 218 insertions(+), 2 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/40d9a0b6/book/master_middleman/source/subnavs/apache-hawq-nav.erb ---------------------------------------------------------------------- diff --git a/book/master_middleman/source/subnavs/apache-hawq-nav.erb b/book/master_middleman/source/subnavs/apache-hawq-nav.erb index f18edef..85eb424 100644 --- a/book/master_middleman/source/subnavs/apache-hawq-nav.erb +++ b/book/master_middleman/source/subnavs/apache-hawq-nav.erb @@ -398,6 +398,7 @@ <li><a href="/docs/userguide/2.2.0.0-incubating/pxf/HivePXF.html">Accessing Hive Data</a></li> <li><a href="/docs/userguide/2.2.0.0-incubating/pxf/HBasePXF.html">Accessing HBase Data</a></li> <li><a href="/docs/userguide/2.2.0.0-incubating/pxf/JsonPXF.html">Accessing JSON Data</a></li> + <li><a href="/docs/userguide/2.2.0.0-incubating/pxf/JdbcPXF.html">Accessing External SQL Databases</a></li> <li><a href="/docs/userguide/2.2.0.0-incubating/pxf/HDFSWritablePXF.html">Writing Data to HDFS</a></li> <li><a href="/docs/userguide/2.2.0.0-incubating/pxf/ReadWritePXF.html">Using Profiles to Read and Write Data</a></li> <li><a href="/docs/userguide/2.2.0.0-incubating/pxf/PXFExternalTableandAPIReference.html">PXF External Tables and API</a></li> http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/40d9a0b6/markdown/pxf/HawqExtensionFrameworkPXF.html.md.erb ---------------------------------------------------------------------- diff --git a/markdown/pxf/HawqExtensionFrameworkPXF.html.md.erb b/markdown/pxf/HawqExtensionFrameworkPXF.html.md.erb index 1a49576..8d82d8a 100644 --- a/markdown/pxf/HawqExtensionFrameworkPXF.html.md.erb +++ b/markdown/pxf/HawqExtensionFrameworkPXF.html.md.erb @@ -51,6 +51,10 @@ PXF allows users to create custom connectors to access other parallel data store This topic describes how to access JSON data using PXF. +- **[Accessing External SQL Databases](../pxf/JdbcPXF.html)** + + This topic describes how to access data in external SQL databases using PXF. + - **[Writing Data to HDFS](../pxf/HDFSWritablePXF.html)** This topic describes how to write to HDFS using PXF. http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/40d9a0b6/markdown/pxf/InstallPXFPlugins.html.md.erb ---------------------------------------------------------------------- diff --git a/markdown/pxf/InstallPXFPlugins.html.md.erb b/markdown/pxf/InstallPXFPlugins.html.md.erb index 4fbda74..8060e5e 100644 --- a/markdown/pxf/InstallPXFPlugins.html.md.erb +++ b/markdown/pxf/InstallPXFPlugins.html.md.erb @@ -21,7 +21,7 @@ specific language governing permissions and limitations under the License. --> -This topic describes how to install the built-in PXF service plug-ins that are required to connect PXF to HDFS, Hive, HBase, and JSON. +This topic describes how to install the built-in PXF service plug-ins that are required to connect PXF to HDFS, Hive, HBase, JDBC, and JSON. **Note:** PXF requires that you run Tomcat on the host machine. Tomcat reserves ports 8005, 8080, and 8009. If you have configured Oozie JXM reporting on a host that will run PXF, make sure that the reporting service uses a port other than 8005. This helps to prevent port conflict errors from occurring when you start the PXF service. @@ -48,7 +48,7 @@ Each PXF service plug-in resides in its own RPM. You may have built these RPMs Perform the following steps on **_each_** node in your cluster to install PXF: -1. Install the PXF software, including Apache, the PXF service, and all PXF plug-ins: HDFS, HBase, Hive, JSON: +1. Install the PXF software, including Apache, the PXF service, and all PXF plug-ins: HDFS, HBase, Hive, JDBC, JSON: ```shell $ sudo yum install -y pxf http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/40d9a0b6/markdown/pxf/JdbcPXF.html.md.erb ---------------------------------------------------------------------- diff --git a/markdown/pxf/JdbcPXF.html.md.erb b/markdown/pxf/JdbcPXF.html.md.erb new file mode 100644 index 0000000..337de66 --- /dev/null +++ b/markdown/pxf/JdbcPXF.html.md.erb @@ -0,0 +1,211 @@ +--- +title: Accessing External SQL Databases with JDBC (Beta) +--- + +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +Some of your data may already reside in an external SQL database. The PXF JDBC plug-in reads data stored in SQL databases including MySQL, ORACLE, PostgreSQL, and Hive. + +This section describes how to use PXF with JDBC, including an example of creating and querying an external table that accesses data in a MySQL database table. + +## <a id="jdbcplugin_prereq"></a>Prerequisites + +Before accessing external SQL databases using HAWQ and PXF, ensure that: + +- The JDBC plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information. +- The JDBC driver JAR files for the external SQL database are installed on all cluster nodes. +- The file locations of external SQL database JDBC JAR files are added to `pxf-public.classpath`. If you manage your HAWQ cluster with Ambari, add the JARS via the Ambari UI. If you managed your cluster from the command line, edit the `/etc/pxf/conf/pxf-public.classpath` file directly. + + +## <a id="hdfsplugin_queryextdata"></a>Querying External SQL Data +The PXF JDBC plug-in supports the single profile named `Jdbc`. + +Use the following syntax to create a HAWQ external table representing external SQL database tables you access via JDBC:Â + +``` sql +CREATE [READABLE | WRITABLE] EXTERNAL TABLE <table_name> + ( <column_name> <data_type> [, ...] | LIKE <other_table> ) +LOCATION ('pxf://<host>[:<port>]/<jdbc-schema-name>.<jdbc-database-name>.<jdbc-table-name> + ?PROFILE=Jdbc[&<custom-option>=<value>[...]]') +FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import'); +``` + +JDBC-plug-in-specific keywords and values used in the [CREATE EXTERNAL TABLE](../reference/sql/CREATE-EXTERNAL-TABLE.html) call are described in the table below. + +| Keyword | Value | +|-------|-------------------------------------| +| \<column-name\> | The name of the PXF external table column. The PXF \<column-name\> must exactly match the \<column-name\> used in the external SQL table.| +| \<data-type\> | The data type of the PXF external table column. The PXF \<data-type\> must be equivalent to the data type used for \<column-name\> in the SQL table.| +| \<host\> | The PXF host. While \<host\> may identify any PXF agent node, use the HDFS NameNode as it is guaranteed to be available in a running HDFS cluster. If HDFS High Availability is enabled, \<host\> must identify the HDFS NameService. | +| \<port\> | The PXF port. If \<port\> is omitted, PXF assumes \<host\> identifies a High Availability HDFS Nameservice and connects to the port number designated by the `pxf_service_port` server configuration parameter value. Default is 51200. | +| \<jdbc-schema-name\> | The schema name. The default schema name is `default`. | +| \<jdbc-database-name\> | The database name. The default database name is determined by the external SQL server. | +| \<jdbc-table-name\> | The table name. | +| PROFILE | The `PROFILE` keyword must specify `Jdbc`. | +| \<custom-option\> | The custom options supported by the `Jdbc` profile are discussed later in this section.| +| FORMAT 'CUSTOM' | The JDBC `CUSTOM` `FORMAT` supports only the built-in `'pxfwritable_import'` `FORMATTER` property. | + +**Note**: When creating PXF external tables, you cannot use the `HEADER` option in your `FORMAT` specification. + + +### <a id="profile_jdbcoptipns"></a>JDBC Custom Options + +You include custom options in the `LOCATION` URI. Preface each option with an ampersand `&`. + +The JDBC plug-in `Jdbc` profile supports the following \<custom-option\>s. The JDBC connection set up options are required. + +| Option Name | Description +|---------------|--------------------| +| JDBC_DRIVER | The JDBC driver class name. (Required) | +| DB_URL | The URL to the database; includes the hostname, port, and database name. (Required) | +| USER | The database user name. (Required) | +| PASS | The database password for USER. (Required) | +| PARTITION_BY | The partition column, \<column-name\>:\<column-type\>. The JDBC plug-in supports `date`, `int`, and `enum` \<column-type\>s. Use the `yyyy-MM-dd` format for the `date` \<column-type\>. A null `PARTITION_BY` defaults to a single fragment. | +| RANGE | (Used only when `PARTITION_BY` is specified.) The query range, \<start-value\>[:\<end-value\>]. \<end-value\> may be empty for an `int` \<column-type\>. The `RANGE` is left closed, right open. That is, the range includes the \<start-value\> but does *not* include the \<end-value\>.| +| INTERVAL | (Used only when `PARTITION_BY` is specified.) The interval, \<interval-num\>[:\<interval-unit\>], of one fragment. `INTERVAL` may be empty for an `enum` \<column-type\>. \<interval-unit\> may be empty for an `int` \<column-type\>. | + +Example JDBC \<custom-option\> connection string: + +``` pre +&JDBC_DRIVER=com.mysql.jdbc.Driver&DB_URL=jdbc:mysql://<dbhost>:<dbport>/testdb&USER=user1&PASS=changeme +``` + +Example JDBC \<custom-option\> substrings identifying partitioning parameters: + +``` pre +&PARTITION_BY=year:int&RANGE=2011:2013&INTERVAL=1 +&PARTITION_BY=createdate:date&RANGE=2013-01-01:2016-01-01&INTERVAL=1:month +&PARTITION_BY=color:enum&RANGE=red:yellow:blue +``` + +### <a id="jdbc_example_mysql"></a>Example: Using the Jdbc Profile to Access a MySQL Database Table + + +#### <a id="ex_create_mysqltbl"></a>Create a MySQL Table + +Perform the following steps to create a MySQL table named `mysql_table1` in the default schema of a database named `mtestdb`: + +1. Connect to the default MySQL database as the root user: + + ``` shell + $ mysql --user=root -p + ``` + +2. Create a database named `mtestdb`: + + ``` sql + mysql> CREATE DATABASE mtestdb; + ``` + +2. Assign privileges to `mtestdb` to a user named `muser1` and then exit: + + ``` sql + mysql> GRANT ALL on mtestdb.* TO 'muser1'@'localhost' IDENTIFIED BY 'muser1'; + mysql> \q + ``` + +2. Connect to `mtestdb` as user `muser1`: + + ``` shell + $ mysql --user=muser1 mtestdb -pmuser1 + ``` + +2. Create a table named `mysql_table1` and insert some data into this table: + + ``` sql + mysql> CREATE TABLE mysql_table1(id int); + mysql> INSERT INTO mysql_table1 VALUES (1); + mysql> INSERT INTO mysql_table1 VALUES (2); + mysql> INSERT INTO mysql_table1 VALUES (3); + ``` + +3. Determine the MySQL server hostname and port: + + ``` sql + mysql> SHOW VARIABLES LIKE 'hostname'; + +---------------+-----------------+ + | Variable_name | Value | + +---------------+-----------------+ + | hostname | mydb.server.com | + +---------------+-----------------+ + + mysql> SHOW VARIABLES LIKE 'port'; + +---------------+-------+ + | Variable_name | Value | + +---------------+-------+ + | port | 3306 | + +---------------+-------+ + 1 row in set (0.00 sec) + ``` + + Make note of the hostname and port values returned. + +4. Construct the JDBC connection string, substituting your MySQL server hostname and port number: + + ``` pre + &JDBC_DRIVER=com.mysql.jdbc.Driver&DB_URL=jdbc:mysql://mydb.server.com:3306/mtestdb&USER=muser1&PASS=muser1 + ``` + + Save this string for use later. + +#### <a id="ex_jdbconfig"></a>Configure PXF + +If not already present on your system, download and copy the MySQL connector JAR file to your system and update `pxf-public.classpath` with the location of this file. + +For example, if you manage your HAWQ cluster with Ambari, and the MySQL connector jar file is located in the `/usr/share/java` directory, add the following line: + +``` pre +/usr/share/java/mysql-connector*.jar +``` + +in the Ambari **PXF** service **Configs > Advanced pxf-public.classpath** pane and then use the **Restart** button to restart PXF. + +If you manage your HAWQ cluster from the command line, directly edit the `/etc/pxf/conf/pxf-public.classpath` file, adding the MySQL connector JAR file path. Then restart PXF on *each* HAWQ node with the following command: + +``` shell +root@hawq-node$ sudo service pxf-service restart +``` + +#### <a id="ex_usejdbc"></a>Query Using the Jdbc Profile + +Perform the following steps to create and query an external PXF table to access the `mysql_table1` table you created in the previous section: + +1. Use the `Jdbc` profile to create an external table to access the MySQL `mysql_table1` table. For example: + + ``` sql + gpadmin=# CREATE EXTERNAL TABLE pxf_jdbc_mysql_table1(id int) + LOCATION ('pxf://<host>:51200/mtestdb.mysql_table1?PROFILE=JDBC&JDBC_DRIVER=com.mysql.jdbc.Driver&DB_URL=jdbc:mysql://<dbserver-host>:<port>/mtestdb&USER=hawquser1&PASS=hawquser1') + FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import'); + ``` + + Substitute your PXF \<host\>, as well as the `DB_URL` string you constructed in the previous exercise. + +2. Display all rows of the `pxf_jdbc_mysql_table1` table: + + ``` sql + gpadmin=# SELECT * FROM pxf_jdbc_mysql_table1; + id + ---- + 1 + 2 + 3 + (3 rows) + ``` +
