[GitHub] incubator-hawq-docs pull request #39: HAWQ-1071 - add examples for HiveText ...

dyozie Thu, 27 Oct 2016 09:12:44 -0700

Github user dyozie commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85371576
  
    --- Diff: pxf/HivePXF.html.md.erb ---
    @@ -151,184 +477,120 @@ To enable HCatalog query integration in HAWQ, 
perform the following steps:
         postgres=# GRANT ALL ON PROTOCOL pxf TO "role";
         ``` 
         
    -3.  To query a Hive table with HCatalog integration, simply query HCatalog 
directly from HAWQ. The query syntax is:
     
    -    ``` sql
    -    postgres=# SELECT * FROM hcatalog.hive-db-name.hive-table-name;
    -    ```
    +To query a Hive table with HCatalog integration, query HCatalog directly 
from HAWQ. The query syntax is:
    +
    +``` sql
    +postgres=# SELECT * FROM hcatalog.hive-db-name.hive-table-name;
    +```
     
    -    For example:
    +For example:
     
    -    ``` sql
    -    postgres=# SELECT * FROM hcatalog.default.sales;
    -    ```
    -
    -4.  To obtain a description of a Hive table with HCatalog integration, you 
can use the `psql` client interface.
    -    -   Within HAWQ, use either the `\d                                    
     hcatalog.hive-db-name.hive-table-name` or `\d+                             
            hcatalog.hive-db-name.hive-table-name` commands to describe a 
single table. For example, from the `psql` client interface:
    -
    -        ``` shell
    -        $ psql -d postgres
    -        postgres=# \d hcatalog.default.test
    -
    -        PXF Hive Table "default.test"
    -            Column    |  Type  
    -        --------------+--------
    -         name         | text
    -         type         | text
    -         supplier_key | int4
    -         full_price   | float8 
    -        ```
    -    -   Use `\d hcatalog.hive-db-name.*` to describe the whole database 
schema. For example:
    -
    -        ``` shell
    -        postgres=# \d hcatalog.default.*
    -
    -        PXF Hive Table "default.test"
    -            Column    |  Type  
    -        --------------+--------
    -         type         | text
    -         name         | text
    -         supplier_key | int4
    -         full_price   | float8
    -
    -        PXF Hive Table "default.testabc"
    -         Column | Type 
    -        --------+------
    -         type   | text
    -         name   | text
    -        ```
    -    -   Use `\d hcatalog.*.*` to describe the whole schema:
    -
    -        ``` shell
    -        postgres=# \d hcatalog.*.*
    -
    -        PXF Hive Table "default.test"
    -            Column    |  Type  
    -        --------------+--------
    -         type         | text
    -         name         | text
    -         supplier_key | int4
    -         full_price   | float8
    -
    -        PXF Hive Table "default.testabc"
    -         Column | Type 
    -        --------+------
    -         type   | text
    -         name   | text
    -
    -        PXF Hive Table "userdb.test"
    -          Column  | Type 
    -        ----------+------
    -         address  | text
    -         username | text
    -         
    -        ```
    -
    -    **Note:** When using `\d` or `\d+` commands in the `psql` HAWQ client, 
`hcatalog` will not be listed as a database. If you use other `psql` compatible 
clients, `hcatalog` will be listed as a database with a size value of `-1` 
since `hcatalog` is not a real database in HAWQ.
    -
    -5.  Alternatively, you can use the **pxf\_get\_item\_fields** user-defined 
function (UDF) to obtain Hive table descriptions from other client interfaces 
or third-party applications. The UDF takes a PXF profile and a table pattern 
string as its input parameters.
    -
    -    **Note:** Currently the only supported input profile is `'Hive'`.
    -
    -    For example, the following statement returns a description of a 
specific table. The description includes path, itemname (table), fieldname, and 
fieldtype.
    +``` sql
    +postgres=# SELECT * FROM hcatalog.default.sales_info;
    +```
    +
    +To obtain a description of a Hive table with HCatalog integration, you can 
use the `psql` client interface.
    +
    +-   Within HAWQ, use either the `\d                                        
 hcatalog.hive-db-name.hive-table-name` or `\d+                                 
        hcatalog.hive-db-name.hive-table-name` commands to describe a single 
table. For example, from the `psql` client interface:
    +
    +    ``` shell
    +    $ psql -d postgres
    +    ```
     
         ``` sql
    -    postgres=# select * from pxf_get_item_fields('Hive','default.test');
    +    postgres=# \d hcatalog.default.sales_info_rcfile;
         ```
    -    
    -    ``` pre
    -      path   | itemname |  fieldname   | fieldtype 
    -    ---------+----------+--------------+-----------
    -     default | test     | name         | text
    -     default | test     | type         | text
    -     default | test     | supplier_key | int4
    -     default | test     | full_price   | float8
    -    (4 rows)
    +
    +    ``` shell
    +    PXF Hive Table "default.sales_info_rcfile"
    +          Column      |  Type
    +    ------------------+--------
    +     location         | text
    +     month            | text
    +     number_of_orders | int4
    +     total_sales      | float8
         ```
    +-   Use `\d hcatalog.hive-db-name.*` to describe the whole database 
schema, i.e. all tables in `hive-db-name`.
    +-   Use `\d hcatalog.*.*` to describe the whole schema, i.e. all databases 
and tables.
     
    -    The following statement returns table descriptions from the default 
database.
    +When using `\d` or `\d+` commands in the `psql` HAWQ client, `hcatalog` 
will not be listed as a database. If you use other `psql` compatible clients, 
`hcatalog` will be listed as a database with a size value of `-1` since 
`hcatalog` is not a real database in HAWQ.
    +
    +Alternatively, you can use the `pxf_get_item_fields` user-defined function 
(UDF) to obtain Hive table descriptions from other client interfaces or 
third-party applications. The UDF takes a PXF profile and a table pattern 
string as its input parameters.  **Note:** The only supported input profile at 
this time is `'Hive'`.
    +
    +- The following statement returns a description of a specific table. The 
description includes path, itemname (table), fieldname, and fieldtype.
     
         ``` sql
    -    postgres=# select * from pxf_get_item_fields('Hive','default.*');
    +    postgres=# SELECT * FROM 
pxf_get_item_fields('Hive','default.sales_info_rcfile');
         ```
    -    
    +
         ``` pre
    -      path   | itemname |  fieldname   | fieldtype 
    -    ---------+----------+--------------+-----------
    -     default | test     | name         | text
    -     default | test     | type         | text
    -     default | test     | supplier_key | int4
    -     default | test     | full_price   | float8
    -     default | testabc  | name         | text
    -     default | testabc  | type         | text
    -    (6 rows)
    +      path   |     itemname      |    fieldname     | fieldtype
    +    ---------+-------------------+------------------+-----------
    +     default | sales_info_rcfile | location         | text
    +     default | sales_info_rcfile | month            | text
    +     default | sales_info_rcfile | number_of_orders | int4
    +     default | sales_info_rcfile | total_sales      | float8
         ```
     
    -    The following statement returns a description of the entire schema.
    +- The following statement returns table descriptions from the default 
database.
     
         ``` sql
    -    postgres=# select * from pxf_get_item_fields('Hive', '*.*');
    +    postgres=# SELECT * FROM pxf_get_item_fields('Hive','default.*');
         ```
    -    
    -    ``` pre
    -      path   | itemname |  fieldname   | fieldtype 
    -    ---------+----------+--------------+-----------
    -     default | test     | name         | text
    -     default | test     | type         | text
    -     default | test     | supplier_key | int4
    -     default | test     | full_price   | float8
    -     default | testabc  | name         | text
    -     default | testabc  | type         | text
    -     userdb  | test     | username     | text
    -     userdb  | test     | address      | text
    -    (8 rows)
    +
    +- The following statement returns a description of the entire schema.
    +
    +    ``` sql
    +    postgres=# SELECT * FROM pxf_get_item_fields('Hive', '*.*');
         ```
     
     ### <a id="topic_r5k_pst_25"></a>Limitations
     
     HCatalog integration has the following limitations:
     
    --   HCatalog integration queries and describe commands do not support 
complex types; only primitive types are supported. Use PXF external tables to 
query complex types in Hive instead. (See [Hive Complex 
Types](#topic_b4v_g3n_25) for example.)
    --   Even for primitive types, HCatalog metadata descriptions produced by 
`\d` and` \d+` are converted to HAWQ types. For example, the Hive type 
`tinyint` is converted to HAWQ type `int2`. (See [Data 
Types](../reference/HAWQDataTypes.html) for a list of data types in HAWQ.)
    --   HAWQ reserves the database name "hcatalog" for system use. You cannot 
connect to or alter the system "hcatalog" database.
    +-   HCatalog integration queries and describe commands do not support 
complex types; only primitive types are supported. Use PXF external tables to 
query complex types in Hive. (See [Complex Types Example](#complex_dt_example).)
    +-   Even for primitive types, HCatalog metadata descriptions produced by 
`\d` and` \d+` are converted to HAWQ types. For example, the Hive type 
`tinyint` is converted to HAWQ type `int2`. (See [Data Type 
Mapping](#hive_primdatatypes).)
    +-   HAWQ reserves the database name `hcatalog` for system use. You cannot 
connect to or alter the system `hcatalog` database.
     
     ## <a id="partitionfiltering"></a>Partition Filtering
     
    -The PXF Hive plug-in uses the Hive partitioning feature and directory 
structure. This enables partition exclusion on HDFS files thatÂ contain the 
Hive table.Â To useÂ the partition filteringÂ feature to reduce network traffic 
and I/O, run a PXF query using a WHERE clauseÂ that refers to a specific 
partition in the partitioned Hive table.
    +The PXF Hive plug-in supports the Hive partitioning feature and directory 
structure. This enables partition exclusion on selected HDFS files comprising 
the Hive table.Â To useÂ the partition filteringÂ feature to reduce network 
traffic and I/O, run a PXF query using a `WHERE` clauseÂ that refers to a 
specific partition in the partitioned Hive table.
     
    -To take advantage of PXF partition filtering push-down, name the partition 
fields in the external table. These names must be the same as the names stored 
in the Hive table. Otherwise, PXF ignores Partition filtering and the filtering 
is performed on the HAWQ side, impactingÂ performance.
    +To take advantage of PXF partition filtering push-down, the Hive and PXF 
partition field names should be the same. Otherwise, PXF ignores partition 
filtering and the filtering is performed on the HAWQ side, impactingÂ 
performance.
     
    -**Note:** The Hive plug-in only filters on partition columns, not on other 
table attributes.
    +**Note:** The Hive plug-in filters only on partition columns, not on other 
table attributes.
     
    -### <a id="example2"></a>Example
    +### <a id="example2"></a>Create Partitioned Hive Table
     
     Create aÂ Hive table `sales_part`Â with two partition columns, 
`delivery_state` and `delivery_city:`
     
     ``` sql
    -hive> CREATE TABLE sales_part (name STRING, type STRING, supplier_key INT, 
price DOUBLE)
    -PARTITIONED BY (delivery_state STRING, delivery_city STRING)
    -ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
    +hive> CREATE TABLE sales_part (name string, type string, supplier_key int, 
price double)
    +        PARTITIONED BY (delivery_state string, delivery_city string)
    +        ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
     ```
     
     Load data into this Hive table andÂ add some partitions:
     
     ``` sql
    -hive> LOAD DATA LOCAL INPATH '/local/path/data1.txt' INTO TABLE sales_part 
PARTITION(delivery_state = 'CALIFORNIA', delivery_city = 'San Francisco');
    -hive> LOAD DATA LOCAL INPATH '/local/path/data2.txt' INTO TABLE sales_part 
PARTITION(delivery_state = 'CALIFORNIA', delivery_city = 'Sacramento');
    -hive> LOAD DATA LOCAL INPATH '/local/path/data3.txt' INTO TABLE sales_part 
PARTITION(delivery_state = 'NEVADA'    , delivery_city = 'Reno');
    -hive> LOAD DATA LOCAL INPATH '/local/path/data4.txt' INTO TABLE sales_part 
PARTITION(delivery_state = 'NEVADA'    , delivery_city = 'Las Vegas');
    +hive> INSERT INTO TABLE sales_part PARTITION(delivery_state = 
'CALIFORNIA', delivery_city = 'Fresno') VALUES ('block', 'widget', 33, 15.17);
    --- End diff --
    
    Might be nice to add line breaks and spacing as with the previous code 
block, so readers don't have to scroll horizontally.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq-docs pull request #39: HAWQ-1071 - add examples for HiveText ...

Reply via email to