Github user dyozie commented on a diff in the pull request:
https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85371576
--- Diff: pxf/HivePXF.html.md.erb ---
@@ -151,184 +477,120 @@ To enable HCatalog query integration in HAWQ,
perform the following steps:
postgres=# GRANT ALL ON PROTOCOL pxf TO "role";
```
-3. To query a Hive table with HCatalog integration, simply query HCatalog
directly from HAWQ. The query syntax is:
- ``` sql
- postgres=# SELECT * FROM hcatalog.hive-db-name.hive-table-name;
- ```
+To query a Hive table with HCatalog integration, query HCatalog directly
from HAWQ. The query syntax is:
+
+``` sql
+postgres=# SELECT * FROM hcatalog.hive-db-name.hive-table-name;
+```
- For example:
+For example:
- ``` sql
- postgres=# SELECT * FROM hcatalog.default.sales;
- ```
-
-4. To obtain a description of a Hive table with HCatalog integration, you
can use the `psql` client interface.
- - Within HAWQ, use either the `\d
hcatalog.hive-db-name.hive-table-name` or `\d+
hcatalog.hive-db-name.hive-table-name` commands to describe a
single table. For example, from the `psql` client interface:
-
- ``` shell
- $ psql -d postgres
- postgres=# \d hcatalog.default.test
-
- PXF Hive Table "default.test"
- Column | Type
- --------------+--------
- name | text
- type | text
- supplier_key | int4
- full_price | float8
- ```
- - Use `\d hcatalog.hive-db-name.*` to describe the whole database
schema. For example:
-
- ``` shell
- postgres=# \d hcatalog.default.*
-
- PXF Hive Table "default.test"
- Column | Type
- --------------+--------
- type | text
- name | text
- supplier_key | int4
- full_price | float8
-
- PXF Hive Table "default.testabc"
- Column | Type
- --------+------
- type | text
- name | text
- ```
- - Use `\d hcatalog.*.*` to describe the whole schema:
-
- ``` shell
- postgres=# \d hcatalog.*.*
-
- PXF Hive Table "default.test"
- Column | Type
- --------------+--------
- type | text
- name | text
- supplier_key | int4
- full_price | float8
-
- PXF Hive Table "default.testabc"
- Column | Type
- --------+------
- type | text
- name | text
-
- PXF Hive Table "userdb.test"
- Column | Type
- ----------+------
- address | text
- username | text
-
- ```
-
- **Note:** When using `\d` or `\d+` commands in the `psql` HAWQ client,
`hcatalog` will not be listed as a database. If you use other `psql` compatible
clients, `hcatalog` will be listed as a database with a size value of `-1`
since `hcatalog` is not a real database in HAWQ.
-
-5. Alternatively, you can use the **pxf\_get\_item\_fields** user-defined
function (UDF) to obtain Hive table descriptions from other client interfaces
or third-party applications. The UDF takes a PXF profile and a table pattern
string as its input parameters.
-
- **Note:** Currently the only supported input profile is `'Hive'`.
-
- For example, the following statement returns a description of a
specific table. The description includes path, itemname (table), fieldname, and
fieldtype.
+``` sql
+postgres=# SELECT * FROM hcatalog.default.sales_info;
+```
+
+To obtain a description of a Hive table with HCatalog integration, you can
use the `psql` client interface.
+
+- Within HAWQ, use either the `\d
hcatalog.hive-db-name.hive-table-name` or `\d+
hcatalog.hive-db-name.hive-table-name` commands to describe a single
table. For example, from the `psql` client interface:
+
+ ``` shell
+ $ psql -d postgres
+ ```
``` sql
- postgres=# select * from pxf_get_item_fields('Hive','default.test');
+ postgres=# \d hcatalog.default.sales_info_rcfile;
```
-
- ``` pre
- path | itemname | fieldname | fieldtype
- ---------+----------+--------------+-----------
- default | test | name | text
- default | test | type | text
- default | test | supplier_key | int4
- default | test | full_price | float8
- (4 rows)
+
+ ``` shell
+ PXF Hive Table "default.sales_info_rcfile"
+ Column | Type
+ ------------------+--------
+ location | text
+ month | text
+ number_of_orders | int4
+ total_sales | float8
```
+- Use `\d hcatalog.hive-db-name.*` to describe the whole database
schema, i.e. all tables in `hive-db-name`.
+- Use `\d hcatalog.*.*` to describe the whole schema, i.e. all databases
and tables.
- The following statement returns table descriptions from the default
database.
+When using `\d` or `\d+` commands in the `psql` HAWQ client, `hcatalog`
will not be listed as a database. If you use other `psql` compatible clients,
`hcatalog` will be listed as a database with a size value of `-1` since
`hcatalog` is not a real database in HAWQ.
+
+Alternatively, you can use the `pxf_get_item_fields` user-defined function
(UDF) to obtain Hive table descriptions from other client interfaces or
third-party applications. The UDF takes a PXF profile and a table pattern
string as its input parameters. **Note:** The only supported input profile at
this time is `'Hive'`.
+
+- The following statement returns a description of a specific table. The
description includes path, itemname (table), fieldname, and fieldtype.
``` sql
- postgres=# select * from pxf_get_item_fields('Hive','default.*');
+ postgres=# SELECT * FROM
pxf_get_item_fields('Hive','default.sales_info_rcfile');
```
-
+
``` pre
- path | itemname | fieldname | fieldtype
- ---------+----------+--------------+-----------
- default | test | name | text
- default | test | type | text
- default | test | supplier_key | int4
- default | test | full_price | float8
- default | testabc | name | text
- default | testabc | type | text
- (6 rows)
+ path | itemname | fieldname | fieldtype
+ ---------+-------------------+------------------+-----------
+ default | sales_info_rcfile | location | text
+ default | sales_info_rcfile | month | text
+ default | sales_info_rcfile | number_of_orders | int4
+ default | sales_info_rcfile | total_sales | float8
```
- The following statement returns a description of the entire schema.
+- The following statement returns table descriptions from the default
database.
``` sql
- postgres=# select * from pxf_get_item_fields('Hive', '*.*');
+ postgres=# SELECT * FROM pxf_get_item_fields('Hive','default.*');
```
-
- ``` pre
- path | itemname | fieldname | fieldtype
- ---------+----------+--------------+-----------
- default | test | name | text
- default | test | type | text
- default | test | supplier_key | int4
- default | test | full_price | float8
- default | testabc | name | text
- default | testabc | type | text
- userdb | test | username | text
- userdb | test | address | text
- (8 rows)
+
+- The following statement returns a description of the entire schema.
+
+ ``` sql
+ postgres=# SELECT * FROM pxf_get_item_fields('Hive', '*.*');
```
### <a id="topic_r5k_pst_25"></a>Limitations
HCatalog integration has the following limitations:
-- HCatalog integration queries and describe commands do not support
complex types; only primitive types are supported. Use PXF external tables to
query complex types in Hive instead. (See [Hive Complex
Types](#topic_b4v_g3n_25) for example.)
-- Even for primitive types, HCatalog metadata descriptions produced by
`\d` and` \d+` are converted to HAWQ types. For example, the Hive type
`tinyint` is converted to HAWQ type `int2`. (See [Data
Types](../reference/HAWQDataTypes.html) for a list of data types in HAWQ.)
-- HAWQ reserves the database name "hcatalog" for system use. You cannot
connect to or alter the system "hcatalog" database.
+- HCatalog integration queries and describe commands do not support
complex types; only primitive types are supported. Use PXF external tables to
query complex types in Hive. (See [Complex Types Example](#complex_dt_example).)
+- Even for primitive types, HCatalog metadata descriptions produced by
`\d` and` \d+` are converted to HAWQ types. For example, the Hive type
`tinyint` is converted to HAWQ type `int2`. (See [Data Type
Mapping](#hive_primdatatypes).)
+- HAWQ reserves the database name `hcatalog` for system use. You cannot
connect to or alter the system `hcatalog` database.
## <a id="partitionfiltering"></a>Partition Filtering
-The PXF Hive plug-in uses the Hive partitioning feature and directory
structure. This enables partition exclusion on HDFS files that contain the
Hive table. To use the partition filtering feature to reduce network traffic
and I/O, run a PXF query using a WHERE clause that refers to a specific
partition in the partitioned Hive table.
+The PXF Hive plug-in supports the Hive partitioning feature and directory
structure. This enables partition exclusion on selected HDFS files comprising
the Hive table. To use the partition filtering feature to reduce network
traffic and I/O, run a PXF query using a `WHERE` clause that refers to a
specific partition in the partitioned Hive table.
-To take advantage of PXF partition filtering push-down, name the partition
fields in the external table. These names must be the same as the names stored
in the Hive table. Otherwise, PXF ignores Partition filtering and the filtering
is performed on the HAWQ side, impacting performance.
+To take advantage of PXF partition filtering push-down, the Hive and PXF
partition field names should be the same. Otherwise, PXF ignores partition
filtering and the filtering is performed on the HAWQ side, impactingÂ
performance.
-**Note:** The Hive plug-in only filters on partition columns, not on other
table attributes.
+**Note:** The Hive plug-in filters only on partition columns, not on other
table attributes.
-### <a id="example2"></a>Example
+### <a id="example2"></a>Create Partitioned Hive Table
Create a Hive table `sales_part` with two partition columns,
`delivery_state` and `delivery_city:`
``` sql
-hive> CREATE TABLE sales_part (name STRING, type STRING, supplier_key INT,
price DOUBLE)
-PARTITIONED BY (delivery_state STRING, delivery_city STRING)
-ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
+hive> CREATE TABLE sales_part (name string, type string, supplier_key int,
price double)
+ PARTITIONED BY (delivery_state string, delivery_city string)
+ ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
```
Load data into this Hive table and add some partitions:
``` sql
-hive> LOAD DATA LOCAL INPATH '/local/path/data1.txt' INTO TABLE sales_part
PARTITION(delivery_state = 'CALIFORNIA', delivery_city = 'San Francisco');
-hive> LOAD DATA LOCAL INPATH '/local/path/data2.txt' INTO TABLE sales_part
PARTITION(delivery_state = 'CALIFORNIA', delivery_city = 'Sacramento');
-hive> LOAD DATA LOCAL INPATH '/local/path/data3.txt' INTO TABLE sales_part
PARTITION(delivery_state = 'NEVADA' , delivery_city = 'Reno');
-hive> LOAD DATA LOCAL INPATH '/local/path/data4.txt' INTO TABLE sales_part
PARTITION(delivery_state = 'NEVADA' , delivery_city = 'Las Vegas');
+hive> INSERT INTO TABLE sales_part PARTITION(delivery_state =
'CALIFORNIA', delivery_city = 'Fresno') VALUES ('block', 'widget', 33, 15.17);
--- End diff --
Might be nice to add line breaks and spacing as with the previous code
block, so readers don't have to scroll horizontally.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---