Repository: incubator-hawq-docs Updated Branches: refs/heads/develop dcb5cadfc -> 5714ce5b3
HAWQ-1376 - clarify pxf host and port description (closes #99) Project: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/commit/5714ce5b Tree: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/tree/5714ce5b Diff: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/diff/5714ce5b Branch: refs/heads/develop Commit: 5714ce5b3efb61387e6479907ada58f5aa8f34aa Parents: dcb5cad Author: Lisa Owen <[email protected]> Authored: Thu Mar 9 18:15:45 2017 -0800 Committer: David Yozie <[email protected]> Committed: Thu Mar 9 18:15:45 2017 -0800 ---------------------------------------------------------------------- .../HAWQFilespacesandHighAvailabilityEnabledHDFS.html.md.erb | 4 ++++ markdown/pxf/HBasePXF.html.md.erb | 2 +- markdown/pxf/HDFSFileDataPXF.html.md.erb | 3 ++- markdown/pxf/HDFSWritablePXF.html.md.erb | 3 ++- markdown/pxf/HivePXF.html.md.erb | 3 ++- markdown/pxf/JsonPXF.html.md.erb | 5 +++-- markdown/pxf/PXFExternalTableandAPIReference.html.md.erb | 4 ++-- markdown/pxf/TroubleshootingPXF.html.md.erb | 2 +- markdown/reference/sql/CREATE-EXTERNAL-TABLE.html.md.erb | 4 ++-- 9 files changed, 19 insertions(+), 11 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/5714ce5b/markdown/admin/HAWQFilespacesandHighAvailabilityEnabledHDFS.html.md.erb ---------------------------------------------------------------------- diff --git a/markdown/admin/HAWQFilespacesandHighAvailabilityEnabledHDFS.html.md.erb b/markdown/admin/HAWQFilespacesandHighAvailabilityEnabledHDFS.html.md.erb index 6923494..20892f6 100644 --- a/markdown/admin/HAWQFilespacesandHighAvailabilityEnabledHDFS.html.md.erb +++ b/markdown/admin/HAWQFilespacesandHighAvailabilityEnabledHDFS.html.md.erb @@ -240,3 +240,7 @@ For command-line administrators: $ hawq init standby -n -M fast ``` + +## <a id="pxfnhdfsnamenode"></a>Using PXF with HDFS NameNode HA + +If HDFS NameNode High Availability is enabled, use the HDFS Nameservice ID in the `LOCATION` clause \<host\> field when invoking any PXF `CREATE EXTERNAL TABLE` command. If the \<port\> is omitted from the `LOCATION` URI, PXF connects to the port number designated by the `pxf_service_port` server configuration parameter value (default is 51200). \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/5714ce5b/markdown/pxf/HBasePXF.html.md.erb ---------------------------------------------------------------------- diff --git a/markdown/pxf/HBasePXF.html.md.erb b/markdown/pxf/HBasePXF.html.md.erb index 3be06d2..ddb86d5 100644 --- a/markdown/pxf/HBasePXF.html.md.erb +++ b/markdown/pxf/HBasePXF.html.md.erb @@ -43,7 +43,7 @@ To create an external HBase table, use the following syntax: ``` sql CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name ( column_name data_type [, ...] | LIKE other_table ) -LOCATION ('pxf://namenode[:port]/hbase-table-name?Profile=HBase') +LOCATION ('pxf://host[:port]/hbase-table-name?Profile=HBase') FORMAT 'CUSTOM' (Formatter='pxfwritable_import'); ``` http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/5714ce5b/markdown/pxf/HDFSFileDataPXF.html.md.erb ---------------------------------------------------------------------- diff --git a/markdown/pxf/HDFSFileDataPXF.html.md.erb b/markdown/pxf/HDFSFileDataPXF.html.md.erb index 6780650..47b964f 100644 --- a/markdown/pxf/HDFSFileDataPXF.html.md.erb +++ b/markdown/pxf/HDFSFileDataPXF.html.md.erb @@ -100,7 +100,8 @@ HDFS-plug-in-specific keywords and values used in the [CREATE EXTERNAL TABLE](.. | Keyword | Value | |-------|-------------------------------------| -| \<host\>[:\<port\>] | The HDFS NameNode and port. | +| \<host\> | The PXF host. While \<host\> may identify any PXF agent node, use the HDFS NameNode as it is guaranteed to be available in a running HDFS cluster. If HDFS High Availability is enabled, \<host\> must identify the HDFS NameService. | +| \<port\> | The PXF port. If \<port\> is omitted, PXF assumes \<host\> identifies a High Availability HDFS Nameservice and connects to the port number designated by the `pxf_service_port` server configuration parameter value. Default is 51200. | | \<path-to-hdfs-file\> | The path to the file in the HDFS data store. | | PROFILE | The `PROFILE` keyword must specify one of the values `HdfsTextSimple`, `HdfsTextMulti`, or `Avro`. | | \<custom-option\> | \<custom-option\> is profile-specific. Profile-specific options are discussed in the relevant profile topic later in this section.| http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/5714ce5b/markdown/pxf/HDFSWritablePXF.html.md.erb ---------------------------------------------------------------------- diff --git a/markdown/pxf/HDFSWritablePXF.html.md.erb b/markdown/pxf/HDFSWritablePXF.html.md.erb index 021b6b9..0c498a2 100644 --- a/markdown/pxf/HDFSWritablePXF.html.md.erb +++ b/markdown/pxf/HDFSWritablePXF.html.md.erb @@ -54,7 +54,8 @@ HDFS-plug-in-specific keywords and values used in the [CREATE EXTERNAL TABLE](.. | Keyword | Value | |-------|-------------------------------------| -| \<host\>[:\<port\>] | The HDFS NameNode and port. | +| \<host\> | The PXF host. While \<host\> may identify any PXF agent node, use the HDFS NameNode as it is guaranteed to be available in a running HDFS cluster. If HDFS High Availability is enabled, \<host\> must identify the HDFS NameService. | +| \<port\> | The PXF port. If \<port\> is omitted, PXF assumes \<host\> identifies a High Availability HDFS Nameservice and connects to the port number designated by the `pxf_service_port` server configuration parameter value. Default is 51200. | | \<path-to-hdfs-file\> | The path to the file in the HDFS data store. | | PROFILE | The `PROFILE` keyword must specify one of the values `HdfsTextSimple` or `SequenceWritable`. | | \<custom-option\> | \<custom-option\> is profile-specific. These options are discussed in the next topic.| http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/5714ce5b/markdown/pxf/HivePXF.html.md.erb ---------------------------------------------------------------------- diff --git a/markdown/pxf/HivePXF.html.md.erb b/markdown/pxf/HivePXF.html.md.erb index 6101016..bc4e9f6 100644 --- a/markdown/pxf/HivePXF.html.md.erb +++ b/markdown/pxf/HivePXF.html.md.erb @@ -332,7 +332,8 @@ Hive-plug-in-specific keywords and values used in the [CREATE EXTERNAL TABLE](.. | Keyword | Value | |-------|-------------------------------------| -| \<host\>[:<port\>] | The HDFS NameNode and port. | +| \<host\> | The PXF host. While \<host\> may identify any PXF agent node, use the HDFS NameNode as it is guaranteed to be available in a running HDFS cluster. If HDFS High Availability is enabled, \<host\> must identify the HDFS NameService. | +| \<port\> | The PXF port. If \<port\> is omitted, PXF assumes \<host\> identifies a High Availability HDFS Nameservice and connects to the port number designated by the `pxf_service_port` server configuration parameter value. Default is 51200. | | \<hive-db-name\> | The name of the Hive database. If omitted, defaults to the Hive database named `default`. | | \<hive-table-name\> | The name of the Hive table. | | PROFILE | The `PROFILE` keyword must specify one of the values `Hive`, `HiveText`, or `HiveRC`. | http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/5714ce5b/markdown/pxf/JsonPXF.html.md.erb ---------------------------------------------------------------------- diff --git a/markdown/pxf/JsonPXF.html.md.erb b/markdown/pxf/JsonPXF.html.md.erb index 5f156c4..6aeea7e 100644 --- a/markdown/pxf/JsonPXF.html.md.erb +++ b/markdown/pxf/JsonPXF.html.md.erb @@ -169,7 +169,8 @@ JSON-plug-in-specific keywords and values used in the `CREATE EXTERNAL TABLE` ca | Keyword | Value | |-------|-------------------------------------| -| \<host\> | Specify the HDFS NameNode in the \<host\> field. | +| \<host\> | The PXF host. While \<host\> may identify any PXF agent node, use the HDFS NameNode as it is guaranteed to be available in a running HDFS cluster. If HDFS High Availability is enabled, \<host\> must identify the HDFS NameService. | +| \<port\> | The PXF port. If \<port\> is omitted, PXF assumes \<host\> identifies a High Availability HDFS Nameservice and connects to the port number designated by the `pxf_service_port` server configuration parameter value. Default is 51200. | | PROFILE | The `PROFILE` keyword must specify the value `Json`. | | IDENTIFIER | Include the `IDENTIFIER` keyword and \<value\> in the `LOCATION` string only when accessing a JSON file with multi-line records. \<value\> should identify the member name used to determine the encapsulating JSON object to return. (If the JSON file is the multi-line record Example 2 above, `&IDENTIFIER=created_at` would be specified.) | | FORMAT | The `FORMAT` clause must specify `CUSTOM`. | @@ -213,4 +214,4 @@ To query this external table populated with JSON data: ``` sql SELECT * FROM sample_json_multiline_tbl; -``` \ No newline at end of file +``` http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/5714ce5b/markdown/pxf/PXFExternalTableandAPIReference.html.md.erb ---------------------------------------------------------------------- diff --git a/markdown/pxf/PXFExternalTableandAPIReference.html.md.erb b/markdown/pxf/PXFExternalTableandAPIReference.html.md.erb index 8a29d1d..3681079 100644 --- a/markdown/pxf/PXFExternalTableandAPIReference.html.md.erb +++ b/markdown/pxf/PXFExternalTableandAPIReference.html.md.erb @@ -53,8 +53,8 @@ FORMAT 'custom' (formatter='pxfwritable_import|pxfwritable_export'); | Parameter | Value and description | |-------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| host | The HDFS NameNode. | -| port | Connection port for the PXF service. If the port is omitted, PXF assumes that High Availability (HA) is enabled and connects to the HA name service port, 51200, by default. The HA name service port can be changed by setting the `pxf_service_port` configuration parameter. | +| \<host\> | The PXF host. While \<host\> may identify any PXF agent node, use the HDFS NameNode as it is guaranteed to be available in a running HDFS cluster. If HDFS High Availability is enabled, \<host\> must identify the HDFS NameService. | +| \<port\> | The PXF port. If \<port\> is omitted, PXF assumes \<host\> identifies a High Availability HDFS Nameservice and connects to the port number designated by the `pxf_service_port` server configuration parameter value. Default is 51200. | | \<path\-to\-data\> | A directory, file name, wildcard pattern, table name, etc. | | PROFILE | The profile PXF uses to access the data. PXF supports multiple plug-ins that currently expose profiles named `HBase`, `Hive`, `HiveRC`, `HiveText`, `HiveORC`, `HdfsTextSimple`, `HdfsTextMulti`, `Avro`, `SequenceWritable`, and `Json`. | | FRAGMENTER | The Java class the plug-in uses for fragmenting data. Used for READABLE external tables only. | http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/5714ce5b/markdown/pxf/TroubleshootingPXF.html.md.erb ---------------------------------------------------------------------- diff --git a/markdown/pxf/TroubleshootingPXF.html.md.erb b/markdown/pxf/TroubleshootingPXF.html.md.erb index 57fe9d5..cf1ef13 100644 --- a/markdown/pxf/TroubleshootingPXF.html.md.erb +++ b/markdown/pxf/TroubleshootingPXF.html.md.erb @@ -81,7 +81,7 @@ The following table lists some common errors encountered while using PXF: </tr> <tr class="odd"> <td>ERROR: fail to get filesystem credential for uri hdfs://<namenode>:8020/</td> -<td>Secure PXF: Wrong HDFS host or port is not 8020 (this is a limitation that will be removed in the next release)</td> +<td>Secure PXF: Wrong HDFS host or port is not 8020</td> </tr> <tr class="even"> <td>ERROR: remote component error (413) from '<x>': HTTP status code is 413 but HTTP response string is empty</td> http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/5714ce5b/markdown/reference/sql/CREATE-EXTERNAL-TABLE.html.md.erb ---------------------------------------------------------------------- diff --git a/markdown/reference/sql/CREATE-EXTERNAL-TABLE.html.md.erb b/markdown/reference/sql/CREATE-EXTERNAL-TABLE.html.md.erb index c46870c..c458cae 100644 --- a/markdown/reference/sql/CREATE-EXTERNAL-TABLE.html.md.erb +++ b/markdown/reference/sql/CREATE-EXTERNAL-TABLE.html.md.erb @@ -165,7 +165,7 @@ The `FORMAT` clause is used to describe how external table files are formatted. <dd>The data type of the column.</dd> <dt>LOCATION ('\<protocol\>://\<host\>\[:\<port\>\]/\<path\>/\<file\>' \[, ...\]) </dt> -<dd>For readable external tables, specifies the URI of the external data source(s) to be used to populate the external table or web table. Regular readable external tables allow the `file`, `gpfdist`, and `pxf` protocols. Web external tables allow the `http` protocol. If \<port\> is omitted, the `http` and `gpfdist` protocols assume port `8080` and the `pxf` protocol assumes the \<host\> is a high availability nameservice string. If using the `gpfdist` protocol, the \<path\> is relative to the directory from which `gpfdist` is serving files (the directory specified when you started the `gpfdist` program). Also, the \<path\> can use wildcards (or other C-style pattern matching) in the \<file\> name part of the location to denote multiple files in a directory. For example: +<dd>For readable external tables, specifies the URI of the external data source(s) to be used to populate the external table or web table. Regular readable external tables allow the `file`, `gpfdist`, and `pxf` protocols. Web external tables allow the `http` protocol. If \<port\> is omitted, the `http` and `gpfdist` protocols assume port `8080` and the `pxf` protocol assumes the \<host\> specifies a high availability Nameservice ID. If using the `gpfdist` protocol, the \<path\> is relative to the directory from which `gpfdist` is serving files (the directory specified when you started the `gpfdist` program). Also, the \<path\> can use wildcards (or other C-style pattern matching) in the \<file\> name part of the location to denote multiple files in a directory. For example: ``` pre 'gpfdist://filehost:8081/*' @@ -183,7 +183,7 @@ For writable external tables, specifies the URI location of the `gpfdist` proces With two `gpfdist` locations listed as in the above example, half of the segments would send their output data to the `data1.out` file and the other half to the `data2.out` file. -For the `pxf` protocol, the `LOCATION` string specifies the \<host\> and \<port\> of the PXF service, the location of the data, and the PXF plug-ins (Java classes) used to convert the data between storage format and HAWQ format. If the \<port\> is omitted, the \<host\> is taken to be the logical name for the high availability name service and the \<port\> is the value of the `pxf_service_port` configuration variable, 51200 by default. The URL parameters `FRAGMENTER`, `ACCESSOR`, and `RESOLVER` are the names of PXF plug-ins (Java classes) that convert between the external data format and HAWQ data format. The `FRAGMENTER` parameter is only used with readable external tables. PXF allows combinations of these parameters to be configured as profiles so that a single `PROFILE` parameter can be specified to access external data, for example `?PROFILE=Hive`. Additional \<custom-options\>` can be added to the LOCATION URI to further describe the external data format or storage options. For details about the plug-ins and profiles provided with PXF and information about creating custom plug-ins for other data sources see [Using PXF with Unmanaged Data](../../pxf/HawqExtensionFrameworkPXF.html).</dd> +For the `pxf` protocol, the `LOCATION` string specifies the HDFS NameNode \<host\> and the \<port\> of the PXF service, the location of the data, and the PXF profile or Java classes used to convert the data between storage format and HAWQ format. If the \<port\> is omitted, the \<host\> is taken to be the logical name for the high availability Nameservice, and the \<port\> is the value of the `pxf_service_port` configuration parameter, 51200 by default. The URL parameters `FRAGMENTER`, `ACCESSOR`, and `RESOLVER` are the names of PXF plug-ins (Java classes) that convert between the external data format and HAWQ data format. The `FRAGMENTER` parameter is only used with readable external tables. PXF allows combinations of these parameters to be configured as profiles so that a single `PROFILE` parameter can be specified to access external data, for example `?PROFILE=Hive`. Additional \<custom-options\>` can be added to the LOCATION URI to further describe the external data format or st orage options. For details about the plug-ins and profiles provided with PXF and information about creating custom plug-ins for other data sources see [Using PXF with Unmanaged Data](../../pxf/HawqExtensionFrameworkPXF.html).</dd> <dt>EXECUTE '\<command\>' ON ... </dt> <dd>Allowed for readable web external tables or writable external tables only. For readable web external tables, specifies the OS command to be executed by the segment instances. The \<command\> can be a single OS command or a script. If \<command\> executes a script, that script must reside in the same location on all of the segment hosts and be executable by the HAWQ superuser (`gpadmin`).
