Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/124#discussion_r124583252 --- Diff: markdown/pxf/JdbcPXF.html.md.erb --- @@ -0,0 +1,213 @@ +--- +title: Accessing External SQL Databases with JDBC (Beta) +--- + +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +Some of your data may already reside in an external SQL database. The PXF JDBC plug-in reads data stored in SQL databases including MySQL, ORACLE, PostgreSQL, and Hive. + +This section describes how to use PXF with JDBC, including an example of creating and querying an external table that accesses data in a MySQL database table. + +## <a id="jdbcplugin_prereq"></a>Prerequisites + +Before accessing external SQL databases using HAWQ and PXF, ensure that: + +- The JDBC plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information. +- The JDBC driver JAR files for the external SQL database are installed on all cluster nodes. +- The file locations of external SQL database JDBC JAR files are added to `pxf-public.classpath`. If you manage your HAWQ cluster with Ambari, add the JARS via the Ambari UI. If you managed your cluster from the command line, edit the `/etc/pxf/conf/pxf-public.classpath` file directly. + + +## <a id="hdfsplugin_queryextdata"></a>Querying External SQL Data +The PXF JDBC plug-in supports the single profile named `Jdbc`. + +Use the following syntax to create a HAWQ external table representing external SQL database tables you access via JDBC:Â + +``` sql +CREATE [READABLE | WRITABLE] EXTERNAL TABLE <table_name> + ( <column_name> <data_type> [, ...] | LIKE <other_table> ) +LOCATION ('pxf://<host>[:<port>]/<jdbc-schema-name>.<jdbc-database-name>.<jdbc-table-name> + ?PROFILE=Jdbc[&<custom-option>=<value>[...]]') +FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import'); +``` + +JDBC-plug-in-specific keywords and values used in the [CREATE EXTERNAL TABLE](../reference/sql/CREATE-EXTERNAL-TABLE.html) call are described in the table below. + +| Keyword | Value | +|-------|-------------------------------------| +| \<column-name\> | The name of the PXF external table column. The PXF \<column-name\> must exactly match the \<column-name\> used in the external SQL table.| +| \<data-type\> | The data type of the PXF external table column. The PXF \<data-type\> must be equivalent to the data type used for \<column-name\> in the SQL table.| +| \<host\> | The PXF host. While \<host\> may identify any PXF agent node, use the HDFS NameNode as it is guaranteed to be available in a running HDFS cluster. If HDFS High Availability is enabled, \<host\> must identify the HDFS NameService. | +| \<port\> | The PXF port. If \<port\> is omitted, PXF assumes \<host\> identifies a High Availability HDFS Nameservice and connects to the port number designated by the `pxf_service_port` server configuration parameter value. Default is 51200. | +| \<jdbc-schema-name\> | The schema name. The default schema name is `default`. | +| \<jdbc-database-name\> | The database name. The default database name is determined by the external SQL server. | +| \<jdbc-table-name\> | The table name. | +| PROFILE | The `PROFILE` keyword must specify `Jdbc`. | +| \<custom-option\> | The custom options supported by the `Jdbc` profile are discussed later in this section.| +| FORMAT 'CUSTOM' | The JDBC `CUSTOM` `FORMAT` supports only the built-in `'pxfwritable_import'` `FORMATTER` property. | + +*Note*: When creating PXF external tables, you cannot use the `HEADER` option in your `FORMAT` specification. + + +### <a id="profile_jdbcoptipns"></a>JDBC Custom Options + +You may include one or more custom options in the `LOCATION` URI. Preface each option with an ampersand `&`. + +The JDBC plug-in `Jdbc` profile supports the following \<custom-option\>s: + +| Option Name | Description +|---------------|--------------------| +| JDBC_DRIVER | The JDBC driver class name. | +| DB_URL | The URL to the database; includes the hostname, port, and database name. | +| USER | The database user name. | +| PASS | The database password for USER. | +| PARTITION_BY | The partition column, \<column-name\>:\<column-type\>. The JDBC plug-in supports `date`, `int`, and `enum` \<column-type\>s. Use the `yyyy-MM-dd` format for the `date` \<column-type\>. A null `PARTITION_BY` defaults to a single fragment. | +| RANGE | The query range, \<start-value\>[:\<end-value\>]. \<end-value\> may be empty for an `int` \<column-type\>. | +| INTERVAL | The interval, \<interval-num\>[:\<interval-unit\>], of one fragment. `INTERVAL` may be empty for an `enum` \<column-type\>. \<interval-unit\> may be empty for an `int` \<column-type\>. | + +**Note**: The partition `RANGE` interval is left closed, right open. That is, the range includes the \<start-value\> but does *not* include the \<end-value\>. --- End diff -- I'd move this (not as a note, necessarily, as MD formatting won't work) to the RANGE description in the table.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---