use numbered steps throughout

Project: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/repo
Commit: 
http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/commit/f9c340af
Tree: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/tree/f9c340af
Diff: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/diff/f9c340af

Branch: refs/heads/develop
Commit: f9c340af7fc117da341e63689262fa3e6b2da0da
Parents: cd6d034
Author: Lisa Owen <[email protected]>
Authored: Tue Oct 25 14:02:27 2016 -0700
Committer: Lisa Owen <[email protected]>
Committed: Tue Oct 25 14:02:27 2016 -0700

----------------------------------------------------------------------
 pxf/HDFSFileDataPXF.html.md.erb | 321 +++++++++++++++++++----------------
 1 file changed, 174 insertions(+), 147 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/f9c340af/pxf/HDFSFileDataPXF.html.md.erb
----------------------------------------------------------------------
diff --git a/pxf/HDFSFileDataPXF.html.md.erb b/pxf/HDFSFileDataPXF.html.md.erb
index 06f8b4d..a853e3e 100644
--- a/pxf/HDFSFileDataPXF.html.md.erb
+++ b/pxf/HDFSFileDataPXF.html.md.erb
@@ -47,7 +47,7 @@ Perform the following steps to create data files used in 
subsequent exercises:
 1. Create an HDFS directory for PXF example data files:
 
     ``` shell
-     $ sudo -u hdfs hdfs dfs -mkdir -p /data/pxf_examples
+    $ sudo -u hdfs hdfs dfs -mkdir -p /data/pxf_examples
     ```
 
 2. Create a delimited plain text file:
@@ -148,36 +148,39 @@ Use the `HdfsTextSimple` profile when reading plain text 
delimited or .csv files
 |-------|-------------------------------------|
 | delimiter    | The delimiter character in the file. Default value is a comma 
`,`.|
 
+### <a id="profile_hdfstextsimple_query"></a>Query With HdfsTextSimple Profile
 
-The following SQL call uses the PXF `HdfsTextSimple` profile to create a 
queryable HAWQ external table from the `pxf_hdfs_simple.txt` file you created 
and added to HDFS in an earlier section:
+Perform the following steps to create and query external tables accessing the 
`pxf_hdfs_simple.txt` file you created and added to HDFS in an earlier section.
 
-``` sql
-gpadmin=# CREATE EXTERNAL TABLE pxf_hdfs_textsimple(location text, month text, 
num_orders int, total_sales float8)
-            LOCATION 
('pxf://namenode:51200/data/pxf_examples/pxf_hdfs_simple.txt?PROFILE=HdfsTextSimple')
-          FORMAT 'TEXT' (delimiter=E',');
-gpadmin=# SELECT * FROM pxf_hdfs_textsimple;          
-```
+1. Use the `HdfsTextSimple` profile to create a queryable HAWQ external table 
from the `pxf_hdfs_simple.txt` file you created and added to HDFS in an earlier 
section:
 
-``` pre
-   location    | month | num_orders | total_sales 
----------------+-------+------------+-------------
- Prague        | Jan   |        101 |     4875.33
- Rome          | Mar   |         87 |     1557.39
- Bangalore     | May   |        317 |     8936.99
- Beijing       | Jul   |        411 |    11600.67
-(4 rows)
-```
+    ``` sql
+    gpadmin=# CREATE EXTERNAL TABLE pxf_hdfs_textsimple(location text, month 
text, num_orders int, total_sales float8)
+                LOCATION 
('pxf://namenode:51200/data/pxf_examples/pxf_hdfs_simple.txt?PROFILE=HdfsTextSimple')
+              FORMAT 'TEXT' (delimiter=E',');
+    gpadmin=# SELECT * FROM pxf_hdfs_textsimple;          
+    ```
 
-Create a second external table from `pxf_hdfs_simple.txt`, this time using the 
`CSV` `FORMAT`:
+    ``` pre
+       location    | month | num_orders | total_sales 
+    ---------------+-------+------------+-------------
+     Prague        | Jan   |        101 |     4875.33
+     Rome          | Mar   |         87 |     1557.39
+     Bangalore     | May   |        317 |     8936.99
+     Beijing       | Jul   |        411 |    11600.67
+    (4 rows)
+    ```
 
-``` sql
-gpadmin=# CREATE EXTERNAL TABLE pxf_hdfs_textsimple_csv(location text, month 
text, num_orders int, total_sales float8)
-            LOCATION 
('pxf://namenode:51200/data/pxf_examples/pxf_hdfs_simple.txt?PROFILE=HdfsTextSimple')
-          FORMAT 'CSV';
-gpadmin=# SELECT * FROM pxf_hdfs_textsimple_csv;          
-```
+2. Create a second external table from `pxf_hdfs_simple.txt`, this time using 
the `CSV` `FORMAT`:
 
-Notice: When specifying `FORMAT 'CSV'` for a comma-separated value file, no 
`delimiter` formatter option is required, as comma is the default.
+    ``` sql
+    gpadmin=# CREATE EXTERNAL TABLE pxf_hdfs_textsimple_csv(location text, 
month text, num_orders int, total_sales float8)
+                LOCATION 
('pxf://namenode:51200/data/pxf_examples/pxf_hdfs_simple.txt?PROFILE=HdfsTextSimple')
+              FORMAT 'CSV';
+    gpadmin=# SELECT * FROM pxf_hdfs_textsimple_csv;          
+    ```
+
+    Notice: When specifying `FORMAT 'CSV'` for a comma-separated value file, 
no `delimiter` formatter option is required, as comma is the default.
 
 ## <a id="profile_hdfstextmulti"></a>HdfsTextMulti Profile
 
@@ -189,30 +192,39 @@ Use the `HdfsTextMulti` profile when reading plain text 
files with delimited sin
 |-------|-------------------------------------|
 | delimiter    | The delimiter character in the file. |
 
-The following SQL call uses the PXF `HdfsTextMulti` profile to create a 
queryable HAWQ external table from the `pxf_hdfs_multi.txt` file you created 
and added to HDFS in an earlier section:
+### <a id="profile_hdfstextmulti_query"></a>Query With HdfsTextMulti Profile
 
-``` sql
-gpadmin=# CREATE EXTERNAL TABLE pxf_hdfs_textmulti(address text, month text, 
year int)
-            LOCATION 
('pxf://namenode:51200/data/pxf_examples/pxf_hdfs_multi.txt?PROFILE=HdfsTextMulti')
-          FORMAT 'CSV' (delimiter=E':');
-gpadmin=# SELECT * FROM pxf_hdfs_textmulti;
-```
+Perform the following operations to create and query an external HAWQ table 
accessing the `pxf_hdfs_multi.txt` file you created and added to HDFS in an 
earlier section.
 
-``` pre
-         address          | month | year 
---------------------------+-------+------
- 4627 Star Rd.            | Sept  | 2017
- San Francisco, CA  94107           
- 113 Moon St.             | Jan   | 2018
- San Diego, CA  92093               
- 51 Belt Ct.              | Dec   | 2016
- Denver, CO  90123                  
- 93114 Radial Rd.         | Jul   | 2017
- Chicago, IL  60605                 
- 7301 Brookview Ave.      | Dec   | 2018
- Columbus, OH  43213                
-(5 rows)
-```
+1. Use the `HdfsTextMulti` profile to create a queryable external table from 
the `pxf_hdfs_multi.txt` file:
+
+    ``` sql
+    gpadmin=# CREATE EXTERNAL TABLE pxf_hdfs_textmulti(address text, month 
text, year int)
+                LOCATION 
('pxf://namenode:51200/data/pxf_examples/pxf_hdfs_multi.txt?PROFILE=HdfsTextMulti')
+              FORMAT 'CSV' (delimiter=E':');
+    ```
+    
+2. Query the `pxf_hdfs_textmulti` table:
+
+    ``` sql
+    gpadmin=# SELECT * FROM pxf_hdfs_textmulti;
+    ```
+
+    ``` pre
+             address          | month | year 
+    --------------------------+-------+------
+     4627 Star Rd.            | Sept  | 2017
+     San Francisco, CA  94107           
+     113 Moon St.             | Jan   | 2018
+     San Diego, CA  92093               
+     51 Belt Ct.              | Dec   | 2016
+     Denver, CO  90123                  
+     93114 Radial Rd.         | Jul   | 2017
+     Chicago, IL  60605                 
+     7301 Brookview Ave.      | Dec   | 2018
+     Columbus, OH  43213                
+    (5 rows)
+    ```
 
 ## <a id="profile_hdfsavro"></a>Avro Profile
 
@@ -252,7 +264,7 @@ The `Avro` profile supports the following 
\<custom-options\>:
 | RECORDKEY_DELIM | The delimiter character(s) to place between the field name 
and value of a record entry when PXF maps an Avro complex data type to a text 
column. The default is a colon `:` character. |
 
 
-### <a id="topic_tr3_dpg_ts__section_m2p_ztg_ts"></a>Avro Schemas
+### <a id="topic_tr3_dpg_ts__section_m2p_ztg_ts"></a>Avro Schemas and Data
 
 Avro schemas are defined using JSON, and composed of the same primitive and 
complex types identified in the data mapping section above. Avro schema files 
typically have a `.avsc` suffix.
 
@@ -267,133 +279,148 @@ The examples in this section will be operating on Avro 
data fields with the foll
 - address - record comprised of street number (int), street name (string), and 
city (string)
 - relationship - enumerated type
 
-Create an Avro schema file to represent the above schema:
+#### <a id="topic_tr3_dpg_ts__section_m2p_ztg_ts_99"></a>Create Sample Schema
 
-``` shell
-$ vi /tmp/avro_schema.avsc
-```
+Perform the following operations to create an Avro schema to represent the 
example schema described above.
 
-Copy and paste the following text into `avro_schema.avsc`:
-
-``` json
-{
-"type" : "record",
-  "name" : "example_schema",
-  "namespace" : "com.example",
-  "fields" : [ {
-    "name" : "id",
-    "type" : "long",
-    "doc" : "Id of the user account"
-  }, {
-    "name" : "username",
-    "type" : "string",
-    "doc" : "Name of the user account"
-  }, {
-    "name" : "followers",
-    "type" : {"type": "array", "items": "string"},
-    "doc" : "Users followers"
-  }, {
-    "name": "fmap",
-    "type": {"type": "map", "values": "long"}
-  }, {
-    "name": "relationship",
-    "type": {
-        "type": "enum",
-        "name": "relationshipEnum",
-        "symbols": ["MARRIED","LOVE","FRIEND","COLLEAGUE","STRANGER","ENEMY"]
-    }
-  }, {
-    "name": "address",
-    "type": {
-        "type": "record",
-        "name": "addressRecord",
-        "fields": [
-            {"name":"number", "type":"int"},
-            {"name":"street", "type":"string"},
-            {"name":"city", "type":"string"}]
+1. Create a file named `avro_schema.avsc`:
+
+    ``` shell
+    $ vi /tmp/avro_schema.avsc
+    ```
+
+2. Copy and paste the following text into `avro_schema.avsc`:
+
+    ``` json
+    {
+    "type" : "record",
+      "name" : "example_schema",
+      "namespace" : "com.example",
+      "fields" : [ {
+        "name" : "id",
+        "type" : "long",
+        "doc" : "Id of the user account"
+      }, {
+        "name" : "username",
+        "type" : "string",
+        "doc" : "Name of the user account"
+      }, {
+        "name" : "followers",
+        "type" : {"type": "array", "items": "string"},
+        "doc" : "Users followers"
+      }, {
+        "name": "fmap",
+        "type": {"type": "map", "values": "long"}
+      }, {
+        "name": "relationship",
+        "type": {
+            "type": "enum",
+            "name": "relationshipEnum",
+            "symbols": 
["MARRIED","LOVE","FRIEND","COLLEAGUE","STRANGER","ENEMY"]
+        }
+      }, {
+        "name": "address",
+        "type": {
+            "type": "record",
+            "name": "addressRecord",
+            "fields": [
+                {"name":"number", "type":"int"},
+                {"name":"street", "type":"string"},
+                {"name":"city", "type":"string"}]
+        }
+      } ],
+      "doc:" : "A basic schema for storing messages"
     }
-  } ],
-  "doc:" : "A basic schema for storing messages"
-}
-```
+    ```
 
-### <a id="topic_tr3_dpg_ts__section_spk_15g_ts"></a>Sample Avro Data (JSON)
+#### <a id="topic_tr3_dpg_ts__section_spk_15g_ts"></a>Create Sample Avro Data 
File (JSON)
 
-Create a text file named `pxf_hdfs_avro.txt`:
+Perform the following steps to create a sample Avro data file conforming to 
the above schema.
 
-``` shell
-$ vi /tmp/pxf_hdfs_avro.txt
-```
+1.  Create a text file named `pxf_hdfs_avro.txt`:
+
+    ``` shell
+    $ vi /tmp/pxf_hdfs_avro.txt
+    ```
 
-Enter the following data into `pxf_hdfs_avro.txt`:
+2. Enter the following data into `pxf_hdfs_avro.txt`:
 
-``` pre
-{"id":1, "username":"john","followers":["kate", "santosh"], "relationship": 
"FRIEND", "fmap": {"kate":10,"santosh":4}, "address":{"number":1, 
"street":"renaissance drive", "city":"san jose"}}
+    ``` pre
+    {"id":1, "username":"john","followers":["kate", "santosh"], 
"relationship": "FRIEND", "fmap": {"kate":10,"santosh":4}, 
"address":{"number":1, "street":"renaissance drive", "city":"san jose"}}
     
-{"id":2, "username":"jim","followers":["john", "pam"], "relationship": 
"COLLEAGUE", "fmap": {"john":3,"pam":3}, "address":{"number":9, "street":"deer 
creek", "city":"palo alto"}}
-```
+    {"id":2, "username":"jim","followers":["john", "pam"], "relationship": 
"COLLEAGUE", "fmap": {"john":3,"pam":3}, "address":{"number":9, "street":"deer 
creek", "city":"palo alto"}}
+    ```
 
-The sample data uses a comma `,` to separate top level records and a colon `:` 
to separate map/key values and record field name/values.
+    The sample data uses a comma `,` to separate top level records and a colon 
`:` to separate map/key values and record field name/values.
 
-Convert the text file to Avro format. There are various ways to perform the 
conversion programmatically and via the command line. In this example, we use 
the [Java Avro tools](http://avro.apache.org/releases.html), and the jar file 
resides in the current directory:
+3. Convert the text file to Avro format. There are various ways to perform the 
conversion programmatically and via the command line. In this example, we use 
the [Java Avro tools](http://avro.apache.org/releases.html), and the jar file 
resides in the current directory:
 
-``` shell
-$ java -jar ./avro-tools-1.8.1.jar fromjson --schema-file 
/tmp/avro_schema.avsc /tmp/pxf_hdfs_avro.txt > /tmp/pxf_hdfs_avro.avro
-```
+    ``` shell
+    $ java -jar ./avro-tools-1.8.1.jar fromjson --schema-file 
/tmp/avro_schema.avsc /tmp/pxf_hdfs_avro.txt > /tmp/pxf_hdfs_avro.avro
+    ```
 
-The generated Avro binary data file is written to `/tmp/pxf_hdfs_avro.avro`. 
Copy this file to HDFS:
+    The generated Avro binary data file is written to 
`/tmp/pxf_hdfs_avro.avro`. 
+    
+4. Copy the generated file to HDFS:
 
-``` shell
-$ sudo -u hdfs hdfs dfs -put /tmp/pxf_hdfs_avro.avro /data/pxf_examples/
-```
-### <a id="topic_avro_querydata"></a>Querying Avro Data
+    ``` shell
+    $ sudo -u hdfs hdfs dfs -put /tmp/pxf_hdfs_avro.avro /data/pxf_examples/
+    ```
+    
+### <a id="topic_avro_querydata"></a>Querying With Avro Profile
 
-Create a queryable external table from this Avro file:
+Perform the following steps to create and query an external table accessing 
the `pxf_hdfs_avro.avro` file you added to HDFS in the previous section. When 
creating the table:
 
 -  Map the top-level primitive fields, `id` (type long) and `username` (type 
string), to their equivalent HAWQ types (bigint and text). 
 -  Map the remaining complex fields to type text.
--  Explicitly set the record, map, and collection delimiters using the Avro 
profile custom options:
+-  Explicitly set the record, map, and collection delimiters using the Avro 
profile custom options.
 
-``` sql
-gpadmin=# CREATE EXTERNAL TABLE pxf_hdfs_avro(id bigint, username text, 
followers text, fmap text, relationship text, address text)
-            LOCATION 
('pxf://namenode:51200/data/pxf_examples/pxf_hdfs_avro.avro?PROFILE=Avro&COLLECTION_DELIM=,&MAPKEY_DELIM=:&RECORDKEY_DELIM=:')
-          FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
-```
 
-A simple query of the external table shows the components of the complex type 
data separated with delimiters:
+1. Use the `Avro` profile to create a queryable external table from the 
`pxf_hdfs_avro.avro` file:
 
-``` sql
-gpadmin=# SELECT * FROM pxf_hdfs_avro;
-```
+    ``` sql
+    gpadmin=# CREATE EXTERNAL TABLE pxf_hdfs_avro(id bigint, username text, 
followers text, fmap text, relationship text, address text)
+                LOCATION 
('pxf://namenode:51200/data/pxf_examples/pxf_hdfs_avro.avro?PROFILE=Avro&COLLECTION_DELIM=,&MAPKEY_DELIM=:&RECORDKEY_DELIM=:')
+              FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
+    ```
 
-``` pre
- id | username |   followers    |        fmap         | relationship |         
             address                      
-----+----------+----------------+--------------------+--------------+---------------------------------------------------
-  1 | john     | [kate,santosh] | {kate:10,santosh:4} | FRIEND       | 
{number:1,street:renaissance drive,city:san jose}
-  2 | jim      | [john,pam]     | {pam:3,john:3}      | COLLEAGUE    | 
{number:9,street:deer creek,city:palo alto}
-(2 rows)
-```
+2. Perform a simple query of the `pxf_hdfs_avro` table:
 
-Process the delimited components in the text columns as necessary for your 
application. For example, the following command uses the HAWQ internal 
`string_to_array` function to convert entries in the `followers` field to a 
text array column in a new view.
+    ``` sql
+    gpadmin=# SELECT * FROM pxf_hdfs_avro;
+    ```
 
-``` sql
-gpadmin=# CREATE VIEW followers_view AS 
+    ``` pre
+     id | username |   followers    |        fmap         | relationship |     
                 address                      
+    
----+----------+----------------+--------------------+--------------+---------------------------------------------------
+      1 | john     | [kate,santosh] | {kate:10,santosh:4} | FRIEND       | 
{number:1,street:renaissance drive,city:san jose}
+      2 | jim      | [john,pam]     | {pam:3,john:3}      | COLLEAGUE    | 
{number:9,street:deer creek,city:palo alto}
+    (2 rows)
+    ```
+
+    The simple query of the external table shows the components of the complex 
type data separated with delimiters.
+
+
+3. Process the delimited components in the text columns as necessary for your 
application. For example, the following command uses the HAWQ internal 
`string_to_array` function to convert entries in the `followers` field to a 
text array column in a new view.
+
+    ``` sql
+    gpadmin=# CREATE VIEW followers_view AS 
   SELECT username, address, string_to_array(substring(followers FROM 2 FOR 
(char_length(followers) - 2)), ',')::text[] 
-    AS followers 
-  FROM pxf_hdfs_avro;
-```
+        AS followers 
+      FROM pxf_hdfs_avro;
+    ```
 
-Query the view to filter rows based on whether a particular follower appears 
in the array:
+4. Query the view to filter rows based on whether a particular follower 
appears in the array:
 
-```
-gpadmin=# SELECT username, address FROM followers_view WHERE followers @> 
'{john}';
-```
+    ``` sql
+    gpadmin=# SELECT username, address FROM followers_view WHERE followers @> 
'{john}';
+    ```
 
-``` pre
- username |                   address                   
-----------+---------------------------------------------
- jim      | {number:9,street:deer creek,city:palo alto}
-```
+    ``` pre
+     username |                   address                   
+    ----------+---------------------------------------------
+     jim      | {number:9,street:deer creek,city:palo alto}
+    ```
 
 ## <a id="accessdataonahavhdfscluster"></a>Accessing HDFS Data in a High 
Availability HDFS Cluster
 

Reply via email to