[
https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612280#comment-15612280
]
ASF GitHub Bot commented on HAWQ-1107:
--------------------------------------
Github user kavinderd commented on a diff in the pull request:
https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r85361807
--- Diff: pxf/HDFSFileDataPXF.html.md.erb ---
@@ -2,506 +2,449 @@
title: Accessing HDFS File Data
---
-## <a id="installingthepxfhdfsplugin"></a>Prerequisites
+HDFS is the primary distributed storage mechanism used by Apache Hadoop
applications. The PXF HDFS plug-in reads file data stored in HDFS. The plug-in
supports plain delimited and comma-separated-value format text files. The HDFS
plug-in also supports the Avro binary format.
-Before working with HDFS file data using HAWQ and PXF, you should perform
the following operations:
+This section describes how to use PXF to access HDFS data, including how
to create and query an external table from files in the HDFS data store.
-- Test PXF on HDFS before connecting to Hive or HBase.
-- Ensure that all HDFS users have read permissions to HDFS services and
that write permissions have been limited to specific users.
+## <a id="hdfsplugin_prereq"></a>Prerequisites
-## <a id="syntax1"></a>Syntax
+Before working with HDFS file data using HAWQ and PXF, ensure that:
-The syntax for creating an external HDFS file is as follows:
+- The HDFS plug-in is installed on all cluster nodes. See [Installing
PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information.
+- All HDFS users have read permissions to HDFS services and that write
permissions have been restricted to specific users.
-``` sql
-CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name
- ( column_name data_type [, ...] | LIKE other_table )
-LOCATION ('pxf://host[:port]/path-to-data?<pxf
parameters>[&custom-option=value...]')
- FORMAT '[TEXT | CSV | CUSTOM]' (<formatting_properties>);
-```
+## <a id="hdfsplugin_fileformats"></a>HDFS File Formats
-where `<pxf parameters>` is:
+The PXF HDFS plug-in supports reading the following file formats:
-``` pre
-
FRAGMENTER=fragmenter_class&ACCESSOR=accessor_class&RESOLVER=resolver_class]
- | PROFILE=profile-name
-```
+- Text File - comma-separated value (.csv) or delimited format plain text
file
+- Avro - JSON-defined, schema-based data serialization format
-**Note:** Omit the `FRAGMENTER` parameter for `READABLE` external tables.
+The PXF HDFS plug-in includes the following profiles to support the file
formats listed above:
-Use an SQL `SELECT` statement to read from an HDFS READABLE table:
+- `HdfsTextSimple` - text files
+- `HdfsTextMulti` - text files with embedded line feeds
+- `Avro` - Avro files
-``` sql
-SELECT ... FROM table_name;
+If you find that the pre-defined PXF HDFS profiles do not meet your needs,
you may choose to create a custom HDFS profile from the existing HDFS
serialization and deserialization classes. Refer to [Adding and Updating
Profiles](ReadWritePXF.html#addingandupdatingprofiles) for information on
creating a custom profile.
+
+## <a id="hdfsplugin_cmdline"></a>HDFS Shell Commands
+Hadoop includes command-line tools that interact directly with HDFS.
These tools support typical file system operations including copying and
listing files, changing file permissions, and so forth.
+
+The HDFS file system command syntax is `hdfs dfs <options> [<file>]`.
Invoked with no options, `hdfs dfs` lists the file system options supported by
the tool.
+
+`hdfs dfs` options used in this topic are:
+
+| Option | Description |
+|-------|-------------------------------------|
+| `-cat` | Display file contents. |
+| `-mkdir` | Create directory in HDFS. |
+| `-put` | Copy file from local file system to HDFS. |
+
+Examples:
+
+Create a directory in HDFS:
+
+``` shell
+$ sudo -u hdfs hdfs dfs -mkdir -p /data/exampledir
--- End diff --
You don't necessarily have to run hdfs commands as `sudo -u hdfs` if the
current user has the hdfs client and permissions.
> PXF HDFS documentation - restructure content and include more examples
> ----------------------------------------------------------------------
>
> Key: HAWQ-1107
> URL: https://issues.apache.org/jira/browse/HAWQ-1107
> Project: Apache HAWQ
> Issue Type: Improvement
> Components: Documentation
> Reporter: Lisa Owen
> Assignee: David Yozie
> Priority: Minor
> Fix For: 2.0.1.0-incubating
>
>
> the current PXF HDFS documentation does not include any runnable examples.
> add runnable examples for all (HdfsTextSimple, HdfsTextMulti, SerialWritable,
> Avro) profiles. restructure the content as well.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)