[04/14] incubator-hawq-docs git commit: remove SerialWritable, use namenode for host

yozie Wed, 26 Oct 2016 11:31:47 -0700

remove SerialWritable, use namenode for host


Project: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/repo
Commit: 
http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/commit/fd029d56
Tree: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/tree/fd029d56
Diff: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/diff/fd029d56

Branch: refs/heads/develop
Commit: fd029d568589f5a4e2461d92437963d97f7d3198
Parents: 5a941a7
Author: Lisa Owen <[email protected]>
Authored: Thu Oct 20 12:20:21 2016 -0700
Committer: Lisa Owen <[email protected]>
Committed: Thu Oct 20 12:20:21 2016 -0700

----------------------------------------------------------------------
 pxf/HDFSFileDataPXF.html.md.erb | 62 ++++--------------------------------
 1 file changed, 7 insertions(+), 55 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/fd029d56/pxf/HDFSFileDataPXF.html.md.erb
----------------------------------------------------------------------
diff --git a/pxf/HDFSFileDataPXF.html.md.erb b/pxf/HDFSFileDataPXF.html.md.erb
index 2f87037..9914ca9 100644
--- a/pxf/HDFSFileDataPXF.html.md.erb
+++ b/pxf/HDFSFileDataPXF.html.md.erb
@@ -2,7 +2,7 @@
 title: Accessing HDFS File Data
 ---
 
-HDFS is the primary distributed storage mechanism used by Apache Hadoop 
applications. The PXF HDFS plug-in reads file data stored in HDFS.  The plug-in 
supports plain delimited and comma-separated-value text files.  The HDFS 
plug-in also supports Avro and SequenceFile binary formats.
+HDFS is the primary distributed storage mechanism used by Apache Hadoop 
applications. The PXF HDFS plug-in reads file data stored in HDFS.  The plug-in 
supports plain delimited and comma-separated-value text files.  The HDFS 
plug-in also supports the Avro binary format.
 
 This section describes how to use PXF to access HDFS data, including how to 
create and query an external table from files in the HDFS data store.
 
@@ -15,10 +15,9 @@ Before working with HDFS file data using HAWQ and PXF, 
ensure that:
 
 ## <a id="hdfsplugin_fileformats"></a>HDFS File Formats
 
-The PXF HDFS plug-in supports the following file formats:
+The PXF HDFS plug-in supports reading the following file formats:
 
 - TextFile - comma-separated value (.csv) or delimited format plain text file
-- SequenceFile - flat file consisting of binary key/value pairs
 - Avro - JSON-defined, schema-based data serialization format
 
 The PXF HDFS plug-in includes the following profiles to support the file 
formats listed above:
@@ -26,7 +25,6 @@ The PXF HDFS plug-in includes the following profiles to 
support the file formats
 - `HdfsTextSimple` - text files
 - `HdfsTextMulti` - text files with embedded line feeds
 - `Avro` - Avro files
-- `SequenceWritable` - SequenceFile  (write only?)
 
 
 ## <a id="hdfsplugin_cmdline"></a>HDFS Shell Commands
@@ -109,7 +107,7 @@ $ sudo -u hdfs hdfs dfs -put /tmp/pxf_hdfs_tm.txt 
/data/pxf_examples/
 You will use these HDFS files in later sections.
 
 ## <a id="hdfsplugin_queryextdata"></a>Querying External HDFS Data
-The PXF HDFS plug-in supports several profiles. These include 
`HdfsTextSimple`, `HdfsTextMulti`, `Avro`, and `SequenceWritable`.
+The PXF HDFS plug-in supports several profiles. These include 
`HdfsTextSimple`, `HdfsTextMulti`, and `Avro`.
 
 Use the following syntax to create a HAWQ external table representing HDFS 
data:Â 
 
@@ -117,7 +115,7 @@ Use the following syntax to create a HAWQ external table 
representing HDFS data:
 CREATE EXTERNAL TABLE <table_name> 
     ( <column_name> <data_type> [, ...] | LIKE <other_table> )
 LOCATION ('pxf://<host>[:<port>]/<path-to-hdfs-file>
-    
?PROFILE=HdfsTextSimple|HdfsTextMulti|Avro|SequenceWritable[&<custom-option>=<value>[...]]')
+    ?PROFILE=HdfsTextSimple|HdfsTextMulti|Avro[&<custom-option>=<value>[...]]')
 FORMAT '[TEXT|CSV|CUSTOM]' (<formatting-properties>);
 ```
 
@@ -127,12 +125,11 @@ HDFS-plug-in-specific keywords and values used in the 
[CREATE EXTERNAL TABLE](..
 |-------|-------------------------------------|
 | \<host\>[:\<port\>]    | The HDFS NameNode and port. |
 | \<path-to-hdfs-file\>    | The path to the file in the HDFS data store. |
-| PROFILE    | The `PROFILE` keyword must specify one of the values 
`HdfsTextSimple`, `HdfsTextMulti`, `SequenceWritable`, or `Avro`. |
+| PROFILE    | The `PROFILE` keyword must specify one of the values 
`HdfsTextSimple`, `HdfsTextMulti`, or `Avro`. |
 | \<custom-option\>  | \<custom-option\> is profile-specific. Profile-specific 
options are discussed in the relevant profile topic later in this section.|
 | FORMAT 'TEXT' | Use '`TEXT`' `FORMAT` with the `HdfsTextSimple` profile when 
\<path-to-hdfs-file\> references a plain text delimited file.  |
 | FORMAT 'CSV' | Use '`CSV`' `FORMAT` with `HdfsTextSimple` and 
`HdfsTextMulti` profiles when \<path-to-hdfs-file\> references a 
comma-separated value file.  |
 | FORMAT 'CUSTOM' | Use the`CUSTOM` `FORMAT` with  the `Avro` profiles. The 
`Avro` '`CUSTOM`' `FORMAT` supports only the built-in 
`(formatter='pxfwritable_import')` \<formatting-property\> |
-| FORMAT 'CUSTOM' | Use the`CUSTOM` `FORMAT` with the `SequenceWritable` 
profile. The `SequenceWritable` '`CUSTOM`' `FORMAT` supports only the built-in 
`(formatter='pxfwritable_export')` \<formatting-property\> |
  \<formatting-properties\>    | \<formatting-properties\> are 
profile-specific. Profile-specific formatting options are discussed in the 
relevant profile topic later in this section. |
 
 *Note*: When creating PXF external tables, you cannot use the `HEADER` option 
in your `FORMAT` specification.
@@ -192,7 +189,7 @@ The following SQL call uses the PXF `HdfsTextMulti` profile 
to create a queryabl
 
 ``` sql
 gpadmin=# CREATE EXTERNAL TABLE pxf_hdfs_textmulti(address text, month text, 
year int)
-            LOCATION 
('pxf://sandbox.hortonworks.com:51200/data/pxf_examples/pxf_hdfs_tm.txt?PROFILE=HdfsTextMulti')
 
+            LOCATION 
('pxf://namenode:51200/data/pxf_examples/pxf_hdfs_tm.txt?PROFILE=HdfsTextMulti')
 
           FORMAT 'CSV' (delimiter=E':');
 gpadmin=# SELECT * FROM pxf_hdfs_textmulti;
 ```
@@ -358,7 +355,7 @@ Create a queryable external table from this Avro file:
 
 ``` sql
 gpadmin=# CREATE EXTERNAL TABLE pxf_hdfs_avro(id bigint, username text, 
followers text, fmap text, relationship text, address text)
-            LOCATION 
('pxf://sandbox.hortonworks.com:51200/data/pxf_examples/pxf_hdfs_avro.avro?PROFILE=Avro&COLLECTION_DELIM=,&MAPKEY_DELIM=:&RECORDKEY_DELIM=:')
+            LOCATION 
('pxf://namenode:51200/data/pxf_examples/pxf_hdfs_avro.avro?PROFILE=Avro&COLLECTION_DELIM=,&MAPKEY_DELIM=:&RECORDKEY_DELIM=:')
           FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
 ```
 
@@ -393,51 +390,6 @@ gpadmin=# SELECT username, address FROM followers_view 
WHERE followers @> '{john
  jim      | {number:9,street:deer creek,city:palo alto}
 ```
 
-## <a id="profile_hdfsseqwritable"></a>SequenceWritable Profile 
-
-Use the `SequenceWritable` profile when writing SequenceFile format files. 
Files of this type consist of binary key/value pairs. Sequence files are a 
common data transfer format between MapReduce jobs. 
-
-The `SequenceWritable` profile supports the following \<custom-options\>:
-
-| Keyword  | Value Description |
-|-------|-------------------------------------|
-| COMPRESSION_CODEC    | The compression codec Java class name. If this option 
is not provided, no data compression is performed. |
-| COMPRESSION_TYPE    | The compression type of the sequence file; supported 
values are `RECORD` (the default) or `BLOCK`. |
-| DATA-SCHEMA    | The name of the writer serialization class. The jar file in 
which this class resides must be in the PXF class path. This option has no 
default value. |
-| THREAD-SAFE | Boolean value determining if a table query can run in 
multi-thread mode. Default value is `TRUE` - requests can run in multi-thread 
mode. When set to `FALSE`, requests will be handled in a single thread. |
-
-???? MORE HERE
-
-??? ADDRESS SERIALIZATION
-
-
-## <a id="recordkeyinkey-valuefileformats"></a>Reading the Record Key 
-
-Sequence file and other file formats that store rows in a key-value format can 
access the key value through HAWQ by using the `recordkey` keyword as a field 
name.
-
-The field type of `recordkey` must correspond to the key type, much as the 
other fields must match the HDFS data.Â 
-
-`recordkey` can be any of the following Hadoop types:
-
--   BooleanWritable
--   ByteWritable
--   DoubleWritable
--   FloatWritable
--   IntWritable
--   LongWritable
--   Text
-
-### <a id="example1"></a>Example
-
-A data schema `Babies.class` contains three fields: name (text), birthday 
(text), weight (float).Â An external table definition for this schema must 
include these three fields, and canÂ either include orÂ ignore the `recordkey`.
-
-``` sql
-gpadmin=# CREATE EXTERNAL TABLE babies_1940 (recordkey int, name text, 
birthday text, weight float)
-            LOCATION 
('pxf://namenode:51200/babies_1940s?PROFILE=SequenceWritable&DATA-SCHEMA=Babies')
-          FORMAT 'CUSTOM' (formatter='pxfwritable_import');
-gpadmin=# SELECT * FROM babies_1940;
-```
-
 ## <a id="accessdataonahavhdfscluster"></a>Accessing HDFS Data in a High 
Availability HDFS Cluster
 
 ToÂ access external HDFS data in a High Availability HDFS cluster, change the 
URI LOCATION clause to use \<HA-nameservice\> rather than  \<host\>[:\<port\>].

[04/14] incubator-hawq-docs git commit: remove SerialWritable, use namenode for host

Reply via email to