Github user kavinderd commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r85361807
  
    --- Diff: pxf/HDFSFileDataPXF.html.md.erb ---
    @@ -2,506 +2,449 @@
     title: Accessing HDFS File Data
     ---
     
    -## <a id="installingthepxfhdfsplugin"></a>Prerequisites
    +HDFS is the primary distributed storage mechanism used by Apache Hadoop 
applications. The PXF HDFS plug-in reads file data stored in HDFS.  The plug-in 
supports plain delimited and comma-separated-value format text files.  The HDFS 
plug-in also supports the Avro binary format.
     
    -Before working with HDFS file data using HAWQ and PXF, you should perform 
the following operations:
    +This section describes how to use PXF to access HDFS data, including how 
to create and query an external table from files in the HDFS data store.
     
    --   Test PXF on HDFS before connecting to Hive or HBase.
    --   Ensure that all HDFS users have read permissions to HDFS services and 
that write permissions have been limited to specific users.
    +## <a id="hdfsplugin_prereq"></a>Prerequisites
     
    -## <a id="syntax1"></a>Syntax
    +Before working with HDFS file data using HAWQ and PXF, ensure that:
     
    -The syntax for creating an external HDFS file is as follows: 
    +-   The HDFS plug-in is installed on all cluster nodes. See [Installing 
PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information.
    +-   All HDFS users have read permissions to HDFS services and that write 
permissions have been restricted to specific users.
     
    -``` sql
    -CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name 
    -    ( column_name data_type [, ...] | LIKE other_table )
    -LOCATION ('pxf://host[:port]/path-to-data?<pxf 
parameters>[&custom-option=value...]')
    -      FORMAT '[TEXT | CSV | CUSTOM]' (<formatting_properties>);
    -```
    +## <a id="hdfsplugin_fileformats"></a>HDFS File Formats
     
    -where `<pxf parameters>` is:
    +The PXF HDFS plug-in supports reading the following file formats:
     
    -``` pre
    -   
FRAGMENTER=fragmenter_class&ACCESSOR=accessor_class&RESOLVER=resolver_class]
    - | PROFILE=profile-name
    -```
    +- Text File - comma-separated value (.csv) or delimited format plain text 
file
    +- Avro - JSON-defined, schema-based data serialization format
     
    -**Note:** Omit the `FRAGMENTER` parameter for `READABLE` external tables.
    +The PXF HDFS plug-in includes the following profiles to support the file 
formats listed above:
     
    -Use an SQL `SELECT` statement to read from an HDFS READABLE table:
    +- `HdfsTextSimple` - text files
    +- `HdfsTextMulti` - text files with embedded line feeds
    +- `Avro` - Avro files
     
    -``` sql
    -SELECT ... FROM table_name;
    +If you find that the pre-defined PXF HDFS profiles do not meet your needs, 
you may choose to create a custom HDFS profile from the existing HDFS 
serialization and deserialization classes. Refer to [Adding and Updating 
Profiles](ReadWritePXF.html#addingandupdatingprofiles) for information on 
creating a custom profile.
    +
    +## <a id="hdfsplugin_cmdline"></a>HDFS Shell Commands
    +Hadoop includes command-line tools that interact directly with HDFS.  
These tools support typical file system operations including copying and 
listing files, changing file permissions, and so forth. 
    +
    +The HDFS file system command syntax is `hdfs dfs <options> [<file>]`. 
Invoked with no options, `hdfs dfs` lists the file system options supported by 
the tool.
    +
    +`hdfs dfs` options used in this topic are:
    +
    +| Option  | Description |
    +|-------|-------------------------------------|
    +| `-cat`    | Display file contents. |
    +| `-mkdir`    | Create directory in HDFS. |
    +| `-put`    | Copy file from local file system to HDFS. |
    +
    +Examples:
    +
    +Create a directory in HDFS:
    +
    +``` shell
    +$ sudo -u hdfs hdfs dfs -mkdir -p /data/exampledir
    --- End diff --
    
    You don't necessarily have to run hdfs commands as `sudo -u hdfs` if the 
current user has the hdfs client and permissions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to