remove book branching on incubator-hawq-docs

yozie Fri, 06 Jan 2017 09:33:41 -0800

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/plext/using_plr.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/plext/using_plr.html.md.erb 
b/markdown/plext/using_plr.html.md.erb
new file mode 100644
index 0000000..367a1d0
--- /dev/null
+++ b/markdown/plext/using_plr.html.md.erb
@@ -0,0 +1,229 @@
+---
+title: Using PL/R in HAWQ
+---
+
+PL/R is a procedural language. With the HAWQ PL/R extension, you can write 
database functions in the R programming language and use R packages that 
contain R functions and data sets.
+
+**Note**: To use PL/R in HAWQ, R must be installed on each node in your HAWQ 
cluster. Additionally, you must install the PL/R package on an existing HAWQ 
deployment or have specified PL/R as a build option when compiling HAWQ.
+
+## <a id="plrexamples"></a>PL/R Examples 
+
+This section contains simple PL/R examples.
+
+### <a id="example1"></a>Example 1: Using PL/R for Single Row Operators 
+
+This function generates an array of numbers with a normal distribution using 
the R function `rnorm()`.
+
+```sql
+CREATE OR REPLACE FUNCTION r_norm(n integer, mean float8, 
+  std_dev float8) RETURNS float8[ ] AS
+$$
+  x<-rnorm(n,mean,std_dev)
+  return(x)
+$$
+LANGUAGE 'plr';
+```
+
+The following `CREATE TABLE` command uses the `r_norm` function to populate 
the table. The `r_norm` function creates an array of 10 numbers.
+
+```sql
+CREATE TABLE test_norm_var
+  AS SELECT id, r_norm(10,0,1) AS x
+  FROM (SELECT generate_series(1,30:: bigint) AS ID) foo
+  DISTRIBUTED BY (id);
+```
+
+### <a id="example2"></a>Example 2: Returning PL/R data.frames in Tabular Form 
+
+Assuming your PL/R function returns an R `data.frame` as its output \(unless 
you want to use arrays of arrays\), some work is required in order for HAWQ to 
see your PL/R `data.frame` as a simple SQL table:
+
+Create a TYPE in HAWQ with the same dimensions as your R `data.frame`:
+
+```sql
+CREATE TYPE t1 AS ...
+```
+
+Use this TYPE when defining your PL/R function:
+
+```sql
+... RETURNS SET OF t1 AS ...
+```
+
+Sample SQL for this situation is provided in the next example.
+
+### <a id="example3"></a>Example 3: Process Employee Information Using PL/R 
+
+The SQL below defines a TYPE and a function to process employee information 
with `data.frame` using PL/R:
+
+```sql
+-- Create type to store employee information
+DROP TYPE IF EXISTS emp_type CASCADE;
+CREATE TYPE emp_type AS (name text, age int, salary numeric(10,2));
+
+-- Create function to process employee information and return data.frame
+DROP FUNCTION IF EXISTS get_emps();
+CREATE OR REPLACE FUNCTION get_emps() RETURNS SETOF emp_type AS '
+    names <- c("Joe","Jim","Jon")
+    ages <- c(41,25,35)
+    salaries <- c(250000,120000,50000)
+    df <- data.frame(name = names, age = ages, salary = salaries)
+
+    return(df)
+' LANGUAGE 'plr';
+
+-- Call the function
+SELECT * FROM get_emps();
+```
+
+
+## <a id="downloadinstallplrlibraries"></a>Downloading and Installing R 
Packages 
+
+R packages are modules that contain R functions and data sets. You can install 
R packages to extend R and PL/R functionality in HAWQ.
+
+**Note**: If you expand HAWQ and add segment hosts, you must install the R 
packages in the R installation of *each* of the new hosts.</p>
+
+1. For an R package, identify all dependent R packages and each package web 
URL. The information can be found by selecting the given package from the 
following navigation page:
+
+       
[http://cran.r-project.org/web/packages/available_packages_by_name.html](http://cran.r-project.org/web/packages/available_packages_by_name.html)
+
+       As an example, the page for the R package `arm` indicates that the 
package requires the following R libraries: `Matrix`, `lattice`, `lme4`, 
`R2WinBUGS`, `coda`, `abind`, `foreign`, and `MASS`.
+       
+       You can also try installing the package with `R CMD INSTALL` command to 
determine the dependent packages.
+       
+       For the R installation included with the HAWQ PL/R extension, the 
required R packages are installed with the PL/R extension. However, the Matrix 
package requires a newer version.
+       
+1. From the command line, use the `wget` utility to download the tar.gz files 
for the `arm` package to the HAWQ master host:
+
+       ```shell
+       $ wget 
http://cran.r-project.org/src/contrib/Archive/arm/arm_1.5-03.tar.gz
+       $ wget 
http://cran.r-project.org/src/contrib/Archive/Matrix/Matrix_0.9996875-1.tar.gz
+       ```
+
+1. Use the `hawq scp` utility and the `hawq_hosts` file to copy the tar.gz 
files to the same directory on all nodes of the HAWQ cluster. The `hawq_hosts` 
file contains a list of all of the HAWQ segment hosts. You might require root 
access to do this.
+
+       ```shell
+       $ hawq scp -f hosts_all Matrix_0.9996875-1.tar.gz =:/home/gpadmin 
+       $ hawq scp -f hawq_hosts arm_1.5-03.tar.gz =:/home/gpadmin
+       ```
+
+1. Use the `hawq ssh` utility in interactive mode to log into each HAWQ 
segment host (`hawq ssh -f hawq_hosts`). Install the packages from the command 
prompt using the `R CMD INSTALL` command. Note that this may require root 
access. For example, this R install command installs the packages for the `arm` 
package.
+
+       ```shell
+       $ R CMD INSTALL Matrix_0.9996875-1.tar.gz arm_1.5-03.tar.gz
+       ```
+       **Note**: Some packages require compilation. Refer to the package 
documentation for possible build requirements.
+
+1. Ensure that the R package was installed in the `/usr/lib64/R/library` 
directory on all the segments (`hawq ssh` can be used to install the package). 
For example, this `hawq ssh` command lists the contents of the R library 
directory.
+
+       ```shell
+       $ hawq ssh -f hawq_hosts "ls /usr/lib64/R/library"
+       ```
+       
+1. Verify the R package can be loaded.
+
+       This function performs a simple test to determine if an R package can 
be loaded:
+       
+       ```sql
+       CREATE OR REPLACE FUNCTION R_test_require(fname text)
+       RETURNS boolean AS
+       $BODY$
+       return(require(fname,character.only=T))
+       $BODY$
+       LANGUAGE 'plr';
+       ```
+
+       This SQL command calls the previous function to determine if the R 
package `arm` can be loaded:
+       
+       ```sql
+       SELECT R_test_require('arm');
+       ```
+
+## <a id="rlibrarydisplay"></a>Displaying R Library Information 
+
+You can use the R command line to display information about the installed 
libraries and functions on the HAWQ host. You can also add and remove libraries 
from the R installation. To start the R command line on the host, log in to the 
host as the `gpadmin` user and run the script R.
+
+``` shell
+$ R
+```
+
+This R function lists the available R packages from the R command line:
+
+```r
+> library()
+```
+
+Display the documentation for a particular R package
+
+```r
+> library(help="package_name")
+> help(package="package_name")
+```
+
+Display the help file for an R function:
+
+```r
+> help("function_name")
+> ?function_name
+```
+
+To see what packages are installed, use the R command `installed.packages()`. 
This will return a matrix with a row for each package that has been installed. 
Below, we look at the first 5 rows of this matrix.
+
+```r
+> installed.packages()
+```
+
+Any package that does not appear in the installed packages matrix must be 
installed and loaded before its functions can be used.
+
+An R package can be installed with `install.packages()`:
+
+```r
+> install.packages("package_name") 
+> install.packages("mypkg", dependencies = TRUE, type="source")
+```
+
+Load a package from the R command line.
+
+```r
+> library(" package_name ") 
+```
+An R package can be removed with remove.packages
+
+```r
+> remove.packages("package_name")
+```
+
+You can use the R command `-e` option to run functions from the command line. 
For example, this command displays help on the R package named `MASS`.
+
+```shell
+$ R -e 'help("MASS")'
+```
+
+## <a id="plrreferences"></a>References 
+
+[http://www.r-project.org/](http://www.r-project.org/) - The R Project home 
page
+
+[https://github.com/pivotalsoftware/gp-r](https://github.com/pivotalsoftware/gp-r)
 - GitHub repository that contains information about using R.
+
+[https://github.com/pivotalsoftware/PivotalR](https://github.com/pivotalsoftware/PivotalR)
 - GitHub repository for PivotalR, a package that provides an R interface to 
operate on HAWQ tables and views that is similar to the R `data.frame`. 
PivotalR also supports using the machine learning package MADlib directly from 
R.
+
+R documentation is installed with the R package:
+
+```shell
+/usr/share/doc/R-N.N.N
+```
+
+where N.N.N corresponds to the version of R installed.
+
+### <a id="rfunctions"></a>R Functions and Arguments 
+
+See 
[http://www.joeconway.com/plr/doc/plr-funcs.html](http://www.joeconway.com/plr/doc/plr-funcs.html).
+
+### <a id="passdatavalues"></a>Passing Data Values in R 
+
+See 
[http://www.joeconway.com/plr/doc/plr-data.html](http://www.joeconway.com/plr/doc/plr-data.html).
+
+### <a id="aggregatefunctions"></a>Aggregate Functions in R 
+
+See 
[http://www.joeconway.com/plr/doc/plr-aggregate-funcs.html](http://www.joeconway.com/plr/doc/plr-aggregate-funcs.html).
+
+


http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/pxf/ConfigurePXF.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/pxf/ConfigurePXF.html.md.erb 
b/markdown/pxf/ConfigurePXF.html.md.erb
new file mode 100644
index 0000000..fec6b27
--- /dev/null
+++ b/markdown/pxf/ConfigurePXF.html.md.erb
@@ -0,0 +1,69 @@
+---
+title: Configuring PXF
+---
+
+This topic describes how to configure the PXF service.
+
+**Note:** After you make any changes to a PXF configuration file (such as 
`pxf-profiles.xml` for adding custom profiles), propagate the changes to all 
nodes with PXF installed, and then restart the PXF service on all nodes.
+
+## <a id="settingupthejavaclasspath"></a>Setting up the Java Classpath
+
+The classpath for the PXF service is set during the plug-in installation 
process. Administrators should only modify it when adding new PXF connectors. 
The classpath is defined in two files:
+
+1.  `/etc/pxf/conf/pxf-private.classpath`Â â contains all the required 
resources to run the PXF service, including pxf-hdfs, pxf-hbase, and pxf-hive 
plug-ins. This file must not be edited or removed.
+2.  `/etc/pxf/conf/pxf-public.classpath` â plug-in jar files and any 
dependent jar files for custom plug-ins and custom profiles should be added 
here. The classpath resources should be defined one per line. Wildcard 
characters can be used in the name of the resource, but not in the full path. 
See [Adding and Updating Profiles](ReadWritePXF.html#addingandupdatingprofiles) 
for information on adding custom profiles.
+
+After changing the classpath files, the PXF service must be restarted.Â 
+
+## <a id="settingupthejvmcommandlineoptionsforpxfservice"></a>Setting up the 
JVM Command Line Options for the PXF Service
+
+The PXF service JVM command line options can be added or modified for each 
pxf-service instance in the `/var/pxf/pxf-service/bin/setenv.sh` file:
+
+Currently the `JVM_OPTS` parameter is set with the following values for 
maximum Java heap sizeÂ and thread stack size:
+
+``` shell
+JVM_OPTS="-Xmx512M -Xss256K"
+```
+
+After adding or modifyingÂ the JVM command line options, the PXF service must 
be restarted.
+
+(Refer to [Addressing PXF Memory Issues](TroubleshootingPXF.html#pxf-memcfg) 
for a related discussion of the configuration options available to address 
memory issues in your PXF deployment.)
+
+## <a id="topic_i3f_hvm_ss"></a>Using PXF on a Secure HDFS Cluster
+
+You can use PXF on a secure HDFS cluster.Â Read, write, and analyze operations 
for PXF tables on HDFS files are enabled.Â It requires no changes to 
preexisting PXF tables from a previous version.
+
+### <a id="requirements"></a>Requirements
+
+-   Both HDFS and YARN principals are created and are properly configured.
+-   HAWQ is correctly configured to work in secure mode.
+
+Please refer to [Troubleshooting PXF](TroubleshootingPXF.html) for common 
errors related to PXF security and their meaning.
+
+## <a id="credentialsforremoteservices"></a>Credentials for Remote Services
+
+Credentials for remote services allows a PXF plug-in to access a remote 
service that requires credentials.
+
+### <a id="inhawq"></a>In HAWQ
+
+Two parameters for credentials are implemented in HAWQ:
+
+-   `pxf_remote_service_login` â a string of characters detailing 
information regarding login (i.e. user name).
+-   `pxf_remote_service_secret` â a string of characters detailing 
information that is considered secret (i.e. password).
+
+Currently, the contents of the two parameters are stored in memory, without 
any security, for the whole session.Â Leaving the session will insecurely drop 
the contents of the parameters.
+
+**Important:** These parameters are temporary and could soon be deprecated, in 
favor of a complete solution for managing credentials for remote services in 
PXF.
+
+### <a id="inapxfplugin"></a>In a PXF Plug-in
+
+In a PXF plug-in, the contents of the two credentials parameters is available 
through the following InputData API functions:
+
+``` java
+string getLogin()
+string getSecret()
+```
+
+Both functions return 'null' if the corresponding HAWQ parameter was set to an 
empty string or was not set at all.Â 
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/pxf/HBasePXF.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/pxf/HBasePXF.html.md.erb 
b/markdown/pxf/HBasePXF.html.md.erb
new file mode 100644
index 0000000..8b89730
--- /dev/null
+++ b/markdown/pxf/HBasePXF.html.md.erb
@@ -0,0 +1,105 @@
+---
+title: Accessing HBase Data
+---
+
+## <a id="installingthepxfhbaseplugin"></a>Prerequisites
+
+Before trying to access HBase data with PXF, verify the following:
+
+-   The `/etc/hbase/conf/hbase-env.sh` configuration file must reference the 
`pxf-hbase.jar`. For example, `/etc/hbase/conf/hbase-env.sh` should include the 
line:
+
+    ``` bash
+    export HBASE_CLASSPATH=${HBASE_CLASSPATH}:/usr/lib/pxf/pxf-hbase.jar
+    ```
+
+    **Note:** You must restart HBase after making any changes to the HBase 
configuration.
+
+-   PXF HBase plug-in is installed on all cluster nodes.
+-   HBase and ZooKeeper jars are installed on all cluster nodes.
+
+## <a id="syntax3"></a>Syntax
+
+To create an external HBase table, use the following syntax:
+
+``` sql
+CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name 
+    ( column_name data_type [, ...] | LIKE other_table )
+LOCATION ('pxf://namenode[:port]/hbase-table-name?Profile=HBase')
+FORMAT 'CUSTOM' (Formatter='pxfwritable_import');
+```
+
+The HBase profile is equivalent to the following PXF parameters:
+
+-   Fragmenter=org.apache.hawq.pxf.plugins.hbase.HBaseDataFragmenter
+-   Accessor=org.apache.hawq.pxf.plugins.hbase.HBaseAccessor
+-   Resolver=org.apache.hawq.pxf.plugins.hbase.HBaseResolver
+
+## <a id="columnmapping"></a>Column Mapping
+
+Most HAWQ external tables (PXF or others) require that the HAWQ table 
attributes match the source data record layout, and include allÂ the available 
attributes. With HAWQ, however, you use the PXF HBase plug-in toÂ specify the 
subset of HBase qualifiers that define the HAWQ PXF table.Â To set up aÂ clear 
mapping between each attribute in the PXF table and a specific qualifier in the 
HBase table, you can use either direct mapping or indirect mapping. In 
addition, the HBase row key is handled in a special way.
+
+### <a id="rowkey"></a>Row Key
+
+You can use the HBase table row key in several ways. For example,Â you can see 
them using query results,Â orÂ you can run aÂ WHERE clause filter on a range of 
row key values. To use the row key in the HAWQ query, define the HAWQ table 
with the reserved PXF attributeÂ `recordkey.`Â This attribute name tells PXF to 
return theÂ record key in any key-value based system and in HBase.
+
+**Note:** Because HBase is byte and not character-based, you should define the 
recordkey as type bytea. This may result in better ability to filter data and 
increase performance.
+
+``` sql
+CREATE EXTERNAL TABLE <tname> (recordkey bytea, ... ) LOCATION ('pxf:// ...')
+```
+
+### <a id="directmapping"></a>Direct Mapping
+
+UseÂ Direct MappingÂ to map HAWQ table attributes to HBase qualifiers. You can 
specify the HBase qualifier names of interest, with column family names 
included, as quoted values.Â 
+
+For example, you have defined an HBase table calledÂ `hbase_sales` with 
multiple column families and many qualifiers. To create a HAWQ table with these 
attributes:
+
+-   `rowkey`
+-   qualifier `saleid` in theÂ column family `cf1`
+-   qualifier `comments` in theÂ column family `cf8`Â 
+
+use the following `CREATE EXTERNAL TABLE` syntax:
+
+``` sql
+CREATE EXTERNAL TABLE hbase_sales (
+  recordkey bytea,
+  "cf1:saleid" int,
+  "cf8:comments" varchar
+) ...
+```
+
+The PXF HBase plug-in uses these attribute names as-is and returns the values 
of these HBase qualifiers.
+
+### <a id="indirectmappingvialookuptable"></a>Indirect Mapping (via Lookup 
Table)
+
+The direct mapping method is fast and intuitive, but usingÂ indirect mappingÂ 
helps toÂ reconcile HBase qualifier names with HAWQ behavior:
+
+-   HBase qualifier names may be longer than 32 characters. HAWQ has a 
32-character limit on attribute name size.
+-   HBase qualifier namesÂ can be binary or non-printable. HAWQ attribute 
names are character based.
+
+InÂ either case, Indirect Mapping uses a lookup table on HBase. You can create 
the lookup table to store all necessary lookup information.Â This works as a 
template for any future queries. The name of the lookup table must beÂ 
`pxflookup` and must includeÂ the column family namedÂ `mapping`.
+
+Using the sales example in Direct Mapping, if our `rowkey` represents the 
HBase table name and theÂ `mapping` column family includes the actual attribute 
mapping in the key value form of`<hawq attr name>=<hbase                        
     cf:qualifier>`.
+
+#### <a id="example5"></a>Example
+
+This example maps the `saleid` qualifier in the `cf1` column family to the 
HAWQ `id` column and the `comments` qualifier in the `cf8` family to the HAWQ 
`cmts` column.
+
+| (row key) | mapping           |
+|-----------|-------------------|
+| sales     | id=cf1:saleid     |
+| sales     | cmts=cf8:comments |
+
+The mapping assigned new names for each qualifier.Â You can use these names in 
your HAWQ table definition:
+
+``` sql
+CREATE EXTERNAL TABLE hbase_sales (
+  recordkey bytea
+  id int,
+  cmts varchar
+) ...
+```
+
+PXF automatically matches HAWQ to HBase column names when aÂ `pxflookup` table 
exists in HBase.
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/pxf/HDFSFileDataPXF.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/pxf/HDFSFileDataPXF.html.md.erb 
b/markdown/pxf/HDFSFileDataPXF.html.md.erb
new file mode 100644
index 0000000..2021565
--- /dev/null
+++ b/markdown/pxf/HDFSFileDataPXF.html.md.erb
@@ -0,0 +1,452 @@
+---
+title: Accessing HDFS File Data
+---
+
+HDFS is the primary distributed storage mechanism used by Apache Hadoop 
applications. The PXF HDFS plug-in reads file data stored in HDFS.  The plug-in 
supports plain delimited and comma-separated-value format text files.  The HDFS 
plug-in also supports the Avro binary format.
+
+This section describes how to use PXF to access HDFS data, including how to 
create and query an external table from files in the HDFS data store.
+
+## <a id="hdfsplugin_prereq"></a>Prerequisites
+
+Before working with HDFS file data using HAWQ and PXF, ensure that:
+
+-   The HDFS plug-in is installed on all cluster nodes. See [Installing PXF 
Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information.
+-   All HDFS users have read permissions to HDFS services and that write 
permissions have been restricted to specific users.
+
+## <a id="hdfsplugin_fileformats"></a>HDFS File Formats
+
+The PXF HDFS plug-in supports reading the following file formats:
+
+- Text File - comma-separated value (.csv) or delimited format plain text file
+- Avro - JSON-defined, schema-based data serialization format
+
+The PXF HDFS plug-in includes the following profiles to support the file 
formats listed above:
+
+- `HdfsTextSimple` - text files
+- `HdfsTextMulti` - text files with embedded line feeds
+- `Avro` - Avro files
+
+If you find that the pre-defined PXF HDFS profiles do not meet your needs, you 
may choose to create a custom HDFS profile from the existing HDFS serialization 
and deserialization classes. Refer to [Adding and Updating 
Profiles](ReadWritePXF.html#addingandupdatingprofiles) for information on 
creating a custom profile.
+
+## <a id="hdfsplugin_cmdline"></a>HDFS Shell Commands
+Hadoop includes command-line tools that interact directly with HDFS.  These 
tools support typical file system operations including copying and listing 
files, changing file permissions, and so forth.
+
+The HDFS file system command syntax is `hdfs dfs <options> [<file>]`. Invoked 
with no options, `hdfs dfs` lists the file system options supported by the tool.
+
+The user invoking the `hdfs dfs` command must have sufficient privileges to 
the HDFS data store to perform HDFS file system operations. Specifically, the 
user must have write permission to HDFS to create directories and files.
+
+`hdfs dfs` options used in this topic are:
+
+| Option  | Description |
+|-------|-------------------------------------|
+| `-cat`    | Display file contents. |
+| `-mkdir`    | Create directory in HDFS. |
+| `-put`    | Copy file from local file system to HDFS. |
+
+Examples:
+
+Create a directory in HDFS:
+
+``` shell
+$ hdfs dfs -mkdir -p /data/exampledir
+```
+
+Copy a text file to HDFS:
+
+``` shell
+$ hdfs dfs -put /tmp/example.txt /data/exampledir/
+```
+
+Display the contents of a text file in HDFS:
+
+``` shell
+$ hdfs dfs -cat /data/exampledir/example.txt
+```
+
+
+## <a id="hdfsplugin_queryextdata"></a>Querying External HDFS Data
+The PXF HDFS plug-in supports the `HdfsTextSimple`, `HdfsTextMulti`, and 
`Avro` profiles.
+
+Use the following syntax to create a HAWQ external table representing HDFS 
data:Â 
+
+``` sql
+CREATE EXTERNAL TABLE <table_name> 
+    ( <column_name> <data_type> [, ...] | LIKE <other_table> )
+LOCATION ('pxf://<host>[:<port>]/<path-to-hdfs-file>
+    ?PROFILE=HdfsTextSimple|HdfsTextMulti|Avro[&<custom-option>=<value>[...]]')
+FORMAT '[TEXT|CSV|CUSTOM]' (<formatting-properties>);
+```
+
+HDFS-plug-in-specific keywords and values used in the [CREATE EXTERNAL 
TABLE](../reference/sql/CREATE-EXTERNAL-TABLE.html) call are described in the 
table below.
+
+| Keyword  | Value |
+|-------|-------------------------------------|
+| \<host\>[:\<port\>]    | The HDFS NameNode and port. |
+| \<path-to-hdfs-file\>    | The path to the file in the HDFS data store. |
+| PROFILE    | The `PROFILE` keyword must specify one of the values 
`HdfsTextSimple`, `HdfsTextMulti`, or `Avro`. |
+| \<custom-option\>  | \<custom-option\> is profile-specific. Profile-specific 
options are discussed in the relevant profile topic later in this section.|
+| FORMAT 'TEXT' | Use '`TEXT`' `FORMAT` with the `HdfsTextSimple` profile when 
\<path-to-hdfs-file\> references a plain text delimited file.  |
+| FORMAT 'CSV' | Use '`CSV`' `FORMAT` with `HdfsTextSimple` and 
`HdfsTextMulti` profiles when \<path-to-hdfs-file\> references a 
comma-separated value file.  |
+| FORMAT 'CUSTOM' | Use the`CUSTOM` `FORMAT` with  the `Avro` profile. The 
`Avro` '`CUSTOM`' `FORMAT` supports only the built-in 
`(formatter='pxfwritable_import')` \<formatting-property\> |
+ \<formatting-properties\>    | \<formatting-properties\> are 
profile-specific. Profile-specific formatting options are discussed in the 
relevant profile topic later in this section. |
+
+*Note*: When creating PXF external tables, you cannot use the `HEADER` option 
in your `FORMAT` specification.
+
+## <a id="profile_hdfstextsimple"></a>HdfsTextSimple Profile
+
+Use the `HdfsTextSimple` profile when reading plain text delimited or .csv 
files where each row is a single record.
+
+\<formatting-properties\> supported by the `HdfsTextSimple` profile include:
+
+| Keyword  | Value |
+|-------|-------------------------------------|
+| delimiter    | The delimiter character in the file. Default value is a comma 
`,`.|
+
+### <a id="profile_hdfstextsimple_query"></a>Example: Using the HdfsTextSimple 
Profile
+
+Perform the following steps to create a sample data file, copy the file to 
HDFS, and use the `HdfsTextSimple` profile to create PXF external tables to 
query the data:
+
+1. Create an HDFS directory for PXF example data files:
+
+    ``` shell
+    $ hdfs dfs -mkdir -p /data/pxf_examples
+    ```
+
+2. Create a delimited plain text data file named `pxf_hdfs_simple.txt`:
+
+    ``` shell
+    $ echo 'Prague,Jan,101,4875.33
+Rome,Mar,87,1557.39
+Bangalore,May,317,8936.99
+Beijing,Jul,411,11600.67' > /tmp/pxf_hdfs_simple.txt
+    ```
+
+    Note the use of the comma `,` to separate the four data fields.
+
+4. Add the data file to HDFS:
+
+    ``` shell
+    $ hdfs dfs -put /tmp/pxf_hdfs_simple.txt /data/pxf_examples/
+    ```
+
+5. Display the contents of the `pxf_hdfs_simple.txt` file stored in HDFS:
+
+    ``` shell
+    $ hdfs dfs -cat /data/pxf_examples/pxf_hdfs_simple.txt
+    ```
+
+1. Use the `HdfsTextSimple` profile to create a queryable HAWQ external table 
from the `pxf_hdfs_simple.txt` file you previously created and added to HDFS:
+
+    ``` sql
+    gpadmin=# CREATE EXTERNAL TABLE pxf_hdfs_textsimple(location text, month 
text, num_orders int, total_sales float8)
+                LOCATION 
('pxf://namenode:51200/data/pxf_examples/pxf_hdfs_simple.txt?PROFILE=HdfsTextSimple')
+              FORMAT 'TEXT' (delimiter=E',');
+    gpadmin=# SELECT * FROM pxf_hdfs_textsimple;          
+    ```
+
+    ``` pre
+       location    | month | num_orders | total_sales 
+    ---------------+-------+------------+-------------
+     Prague        | Jan   |        101 |     4875.33
+     Rome          | Mar   |         87 |     1557.39
+     Bangalore     | May   |        317 |     8936.99
+     Beijing       | Jul   |        411 |    11600.67
+    (4 rows)
+    ```
+
+2. Create a second external table from `pxf_hdfs_simple.txt`, this time using 
the `CSV` `FORMAT`:
+
+    ``` sql
+    gpadmin=# CREATE EXTERNAL TABLE pxf_hdfs_textsimple_csv(location text, 
month text, num_orders int, total_sales float8)
+                LOCATION 
('pxf://namenode:51200/data/pxf_examples/pxf_hdfs_simple.txt?PROFILE=HdfsTextSimple')
+              FORMAT 'CSV';
+    gpadmin=# SELECT * FROM pxf_hdfs_textsimple_csv;          
+    ```
+
+    When specifying `FORMAT 'CSV'` for a comma-separated value file, no 
`delimiter` formatter option is required, as comma is the default.
+
+## <a id="profile_hdfstextmulti"></a>HdfsTextMulti Profile
+
+Use the `HdfsTextMulti` profile when reading plain text files with delimited 
single- or multi- line records that include embedded (quoted) linefeed 
characters.
+
+\<formatting-properties\> supported by the `HdfsTextMulti` profile include:
+
+| Keyword  | Value |
+|-------|-------------------------------------|
+| delimiter    | The delimiter character in the file. |
+
+### <a id="profile_hdfstextmulti_query"></a>Example: Using the HdfsTextMulti 
Profile
+
+Perform the following steps to create a sample data file, copy the file to 
HDFS, and use the `HdfsTextMulti` profile to create a PXF external table to 
query the data:
+
+1. Create a second delimited plain text file:
+
+    ``` shell
+    $ vi /tmp/pxf_hdfs_multi.txt
+    ```
+
+2. Copy/paste the following data into `pxf_hdfs_multi.txt`:
+
+    ``` pre
+    "4627 Star Rd.
+    San Francisco, CA  94107":Sept:2017
+    "113 Moon St.
+    San Diego, CA  92093":Jan:2018
+    "51 Belt Ct.
+    Denver, CO  90123":Dec:2016
+    "93114 Radial Rd.
+    Chicago, IL  60605":Jul:2017
+    "7301 Brookview Ave.
+    Columbus, OH  43213":Dec:2018
+    ```
+
+    Notice the use of the colon `:` to separate the three fields. Also notice 
the quotes around the first (address) field. This field includes an embedded 
line feed separating the street address from the city and state.
+
+3. Add the data file to HDFS:
+
+    ``` shell
+    $ hdfs dfs -put /tmp/pxf_hdfs_multi.txt /data/pxf_examples/
+    ```
+
+4. Use the `HdfsTextMulti` profile to create a queryable external table from 
the `pxf_hdfs_multi.txt` HDFS file, making sure to identify the `:` as the 
field separator:
+
+    ``` sql
+    gpadmin=# CREATE EXTERNAL TABLE pxf_hdfs_textmulti(address text, month 
text, year int)
+                LOCATION 
('pxf://namenode:51200/data/pxf_examples/pxf_hdfs_multi.txt?PROFILE=HdfsTextMulti')
+              FORMAT 'CSV' (delimiter=E':');
+    ```
+    
+2. Query the `pxf_hdfs_textmulti` table:
+
+    ``` sql
+    gpadmin=# SELECT * FROM pxf_hdfs_textmulti;
+    ```
+
+    ``` pre
+             address          | month | year 
+    --------------------------+-------+------
+     4627 Star Rd.            | Sept  | 2017
+     San Francisco, CA  94107           
+     113 Moon St.             | Jan   | 2018
+     San Diego, CA  92093               
+     51 Belt Ct.              | Dec   | 2016
+     Denver, CO  90123                  
+     93114 Radial Rd.         | Jul   | 2017
+     Chicago, IL  60605                 
+     7301 Brookview Ave.      | Dec   | 2018
+     Columbus, OH  43213                
+    (5 rows)
+    ```
+
+## <a id="profile_hdfsavro"></a>Avro Profile
+
+Apache Avro is a data serialization framework where the data is serialized in 
a compact binary format. 
+
+Avro specifies that data types be defined in JSON. Avro format files have an 
independent schema, also defined in JSON. An Avro schema, together with its 
data, is fully self-describing.
+
+### <a id="profile_hdfsavrodatamap"></a>Data Type Mapping
+
+Avro supports both primitive and complex data types. 
+
+To represent Avro primitive data types in HAWQ, map data values to HAWQ 
columns of the same type. 
+
+Avro supports complex data types including arrays, maps, records, 
enumerations, and fixed types. Map top-level fields of these complex data types 
to the HAWQ `TEXT` type. While HAWQ does not natively support these types, you 
can create HAWQ functions or application code to extract or further process 
subcomponents of these complex data types.
+
+The following table summarizes external mapping rules for Avro data.
+
+<a id="topic_oy3_qwm_ss__table_j4s_h1n_ss"></a>
+
+| Avro Data Type                                                    | PXF/HAWQ 
Data Type                                                                       
                                                                                
                                     |
+|-------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Primitive type (int, double, float, long, string, bytes, boolean) | Use the 
corresponding HAWQ built-in data type; see [Data 
Types](../reference/HAWQDataTypes.html). |
+| Complex type: Array, Map, Record, or Enum                         | TEXT, 
with delimiters inserted between collection items, mapped key-value pairs, and 
record data.                                                                    
                       |
+| Complex type: Fixed                                               | BYTEA    
                                                                                
                                                                                
                           |
+| Union                                                             | Follows 
the above conventions for primitive or complex data types, depending on the 
union; supports Null values.                                                    
                 |
+
+### <a id="profile_hdfsavroptipns"></a>Avro-Specific Custom Options
+
+For complex types, the PXF `Avro` profile inserts default delimiters between 
collection items and values. You can use non-default delimiter characters by 
identifying values for specific `Avro` custom options in the `CREATE EXTERNAL 
TABLE` call. 
+
+The `Avro` profile supports the following \<custom-options\>:
+
+| Option Name   | Description       
+|---------------|--------------------|                                         
                                               
+| COLLECTION_DELIM | The delimiter character(s) to place between entries in a 
top-level array, map, or record field when PXF maps an Avro complex data type 
to a text column. The default is the comma `,` character. |
+| MAPKEY_DELIM | The delimiter character(s) to place between the key and value 
of a map entry when PXF maps an Avro complex data type to a text column. The 
default is the colon `:` character. |
+| RECORDKEY_DELIM | The delimiter character(s) to place between the field name 
and value of a record entry when PXF maps an Avro complex data type to a text 
column. The default is the colon `:` character. |
+
+
+### <a id="topic_tr3_dpg_ts__section_m2p_ztg_ts"></a>Avro Schemas and Data
+
+Avro schemas are defined using JSON, and composed of the same primitive and 
complex types identified in the data mapping section above. Avro schema files 
typically have a `.avsc` suffix.
+
+Fields in an Avro schema file are defined via an array of objects, each of 
which is specified by a name and a type.
+
+
+### <a id="topic_tr3_dpg_ts_example"></a>Example: Using the Avro Profile
+
+The examples in this section will operate on Avro data with the following 
record schema:
+
+- id - long
+- username - string
+- followers - array of string
+- fmap - map of long
+- address - record comprised of street number (int), street name (string), and 
city (string)
+- relationship - enumerated type
+
+
+#### <a id="topic_tr3_dpg_ts__section_m2p_ztg_ts_99"></a>Create Schema
+
+Perform the following operations to create an Avro schema to represent the 
example schema described above.
+
+1. Create a file named `avro_schema.avsc`:
+
+    ``` shell
+    $ vi /tmp/avro_schema.avsc
+    ```
+
+2. Copy and paste the following text into `avro_schema.avsc`:
+
+    ``` json
+    {
+    "type" : "record",
+      "name" : "example_schema",
+      "namespace" : "com.example",
+      "fields" : [ {
+        "name" : "id",
+        "type" : "long",
+        "doc" : "Id of the user account"
+      }, {
+        "name" : "username",
+        "type" : "string",
+        "doc" : "Name of the user account"
+      }, {
+        "name" : "followers",
+        "type" : {"type": "array", "items": "string"},
+        "doc" : "Users followers"
+      }, {
+        "name": "fmap",
+        "type": {"type": "map", "values": "long"}
+      }, {
+        "name": "relationship",
+        "type": {
+            "type": "enum",
+            "name": "relationshipEnum",
+            "symbols": 
["MARRIED","LOVE","FRIEND","COLLEAGUE","STRANGER","ENEMY"]
+        }
+      }, {
+        "name": "address",
+        "type": {
+            "type": "record",
+            "name": "addressRecord",
+            "fields": [
+                {"name":"number", "type":"int"},
+                {"name":"street", "type":"string"},
+                {"name":"city", "type":"string"}]
+        }
+      } ],
+      "doc:" : "A basic schema for storing messages"
+    }
+    ```
+
+#### <a id="topic_tr3_dpgspk_15g_tsdata"></a>Create Avro Data File (JSON)
+
+Perform the following steps to create a sample Avro data file conforming to 
the above schema.
+
+1.  Create a text file named `pxf_hdfs_avro.txt`:
+
+    ``` shell
+    $ vi /tmp/pxf_hdfs_avro.txt
+    ```
+
+2. Enter the following data into `pxf_hdfs_avro.txt`:
+
+    ``` pre
+    {"id":1, "username":"john","followers":["kate", "santosh"], 
"relationship": "FRIEND", "fmap": {"kate":10,"santosh":4}, 
"address":{"number":1, "street":"renaissance drive", "city":"san jose"}}
+    
+    {"id":2, "username":"jim","followers":["john", "pam"], "relationship": 
"COLLEAGUE", "fmap": {"john":3,"pam":3}, "address":{"number":9, "street":"deer 
creek", "city":"palo alto"}}
+    ```
+
+    The sample data uses a comma `,` to separate top level records and a colon 
`:` to separate map/key values and record field name/values.
+
+3. Convert the text file to Avro format. There are various ways to perform the 
conversion, both programmatically and via the command line. In this example, we 
use the [Java Avro tools](http://avro.apache.org/releases.html); the jar file 
resides in the current directory:
+
+    ``` shell
+    $ java -jar ./avro-tools-1.8.1.jar fromjson --schema-file 
/tmp/avro_schema.avsc /tmp/pxf_hdfs_avro.txt > /tmp/pxf_hdfs_avro.avro
+    ```
+
+    The generated Avro binary data file is written to 
`/tmp/pxf_hdfs_avro.avro`. 
+    
+4. Copy the generated Avro file to HDFS:
+
+    ``` shell
+    $ hdfs dfs -put /tmp/pxf_hdfs_avro.avro /data/pxf_examples/
+    ```
+    
+#### <a id="topic_avro_querydata"></a>Query With Avro Profile
+
+Perform the following steps to create and query an external table accessing 
the `pxf_hdfs_avro.avro` file you added to HDFS in the previous section. When 
creating the table:
+
+-  Map the top-level primitive fields, `id` (type long) and `username` (type 
string), to their equivalent HAWQ types (bigint and text). 
+-  Map the remaining complex fields to type text.
+-  Explicitly set the record, map, and collection delimiters using the Avro 
profile custom options.
+
+
+1. Use the `Avro` profile to create a queryable external table from the 
`pxf_hdfs_avro.avro` file:
+
+    ``` sql
+    gpadmin=# CREATE EXTERNAL TABLE pxf_hdfs_avro(id bigint, username text, 
followers text, fmap text, relationship text, address text)
+                LOCATION 
('pxf://namenode:51200/data/pxf_examples/pxf_hdfs_avro.avro?PROFILE=Avro&COLLECTION_DELIM=,&MAPKEY_DELIM=:&RECORDKEY_DELIM=:')
+              FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
+    ```
+
+2. Perform a simple query of the `pxf_hdfs_avro` table:
+
+    ``` sql
+    gpadmin=# SELECT * FROM pxf_hdfs_avro;
+    ```
+
+    ``` pre
+     id | username |   followers    |        fmap         | relationship |     
                 address                      
+    
----+----------+----------------+--------------------+--------------+---------------------------------------------------
+      1 | john     | [kate,santosh] | {kate:10,santosh:4} | FRIEND       | 
{number:1,street:renaissance drive,city:san jose}
+      2 | jim      | [john,pam]     | {pam:3,john:3}      | COLLEAGUE    | 
{number:9,street:deer creek,city:palo alto}
+    (2 rows)
+    ```
+
+    The simple query of the external table shows the components of the complex 
type data separated with the delimiters identified in the `CREATE EXTERNAL 
TABLE` call.
+
+
+3. Process the delimited components in the text columns as necessary for your 
application. For example, the following command uses the HAWQ internal 
`string_to_array` function to convert entries in the `followers` field to a 
text array column in a new view.
+
+    ``` sql
+    gpadmin=# CREATE VIEW followers_view AS 
+  SELECT username, address, string_to_array(substring(followers FROM 2 FOR 
(char_length(followers) - 2)), ',')::text[] 
+        AS followers 
+      FROM pxf_hdfs_avro;
+    ```
+
+4. Query the view to filter rows based on whether a particular follower 
appears in the array:
+
+    ``` sql
+    gpadmin=# SELECT username, address FROM followers_view WHERE followers @> 
'{john}';
+    ```
+
+    ``` pre
+     username |                   address                   
+    ----------+---------------------------------------------
+     jim      | {number:9,street:deer creek,city:palo alto}
+    ```
+
+## <a id="accessdataonahavhdfscluster"></a>Accessing HDFS Data in a High 
Availability HDFS Cluster
+
+ToÂ access external HDFS data in a High Availability HDFS cluster, change the 
`CREATE EXTERNAL TABLE` `LOCATION` clause to use \<HA-nameservice\> rather than 
 \<host\>[:\<port\>].
+
+``` sql
+gpadmin=# CREATE EXTERNAL TABLE <table_name> ( <column_name> <data_type> [, 
...] | LIKE <other_table> )
+            LOCATION 
('pxf://<HA-nameservice>/<path-to-hdfs-file>?PROFILE=HdfsTextSimple|HdfsTextMulti|Avro[&<custom-option>=<value>[...]]')
+         FORMAT '[TEXT|CSV|CUSTOM]' (<formatting-properties>);
+```
+
+The opposite is true when a highly available HDFS cluster is reverted to a 
single NameNode configuration. In that case, any table definition that has 
specified \<HA-nameservice\> should use the \<host\>[:\<port\>] syntax.Â 
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/pxf/HawqExtensionFrameworkPXF.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/pxf/HawqExtensionFrameworkPXF.html.md.erb 
b/markdown/pxf/HawqExtensionFrameworkPXF.html.md.erb
new file mode 100644
index 0000000..578d13f
--- /dev/null
+++ b/markdown/pxf/HawqExtensionFrameworkPXF.html.md.erb
@@ -0,0 +1,45 @@
+---
+title: Using PXF with Unmanaged Data
+---
+
+HAWQ Extension Framework (PXF) is an extensible framework that allows HAWQ to 
query external system data.Â 
+
+PXF includes built-in connectors for accessing data inside HDFS files, Hive 
tables, and HBase tables. PXF also integrates with HCatalog to query Hive 
tables directly.
+
+PXF allows users to create custom connectors to access other parallel data 
stores or processing engines.Â To create these connectors using Java plug-ins, 
see the [PXF External Tables and API](PXFExternalTableandAPIReference.html).
+
+-   **[Installing PXF Plug-ins](../pxf/InstallPXFPlugins.html)**
+
+    This topic describes how to install the built-in PXF service plug-ins that 
are required to connect PXF to HDFS, Hive, and HBase. You should install the 
appropriate RPMs on each node in your cluster.
+
+-   **[Configuring PXF](../pxf/ConfigurePXF.html)**
+
+    This topic describes how to configure the PXF service.
+
+-   **[Accessing HDFS File Data](../pxf/HDFSFileDataPXF.html)**
+
+    This topic describes how to access HDFS file data using PXF.
+
+-   **[Accessing Hive Data](../pxf/HivePXF.html)**
+
+    This topic describes how to access Hive data using PXF. You have several 
options for querying data stored in Hive. You can create external tables in PXF 
and then query those tables, or you can easily query Hive tables by using HAWQ 
and PXF's integration with HCatalog. HAWQ accesses Hive table metadata stored 
in HCatalog.
+
+-   **[Accessing HBase Data](../pxf/HBasePXF.html)**
+
+    This topic describes how to access HBase data using PXF.
+
+-   **[Accessing JSON Data](../pxf/JsonPXF.html)**
+
+    This topic describes how to access JSON data using PXF.
+
+-   **[Using Profiles to Read and Write Data](../pxf/ReadWritePXF.html)**
+
+    PXF profiles are collections of common metadata attributes that can be 
used to simplify the reading and writing of data. You can use any of the 
built-in profiles that come with PXF or you can create your own.
+
+-   **[PXF External Tables and 
API](../pxf/PXFExternalTableandAPIReference.html)**
+
+    You can use the PXF API to createÂ your own connectors to access any other 
type of parallel data store or processing engine.
+
+-   **[Troubleshooting PXF](../pxf/TroubleshootingPXF.html)**
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/pxf/HivePXF.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/pxf/HivePXF.html.md.erb b/markdown/pxf/HivePXF.html.md.erb
new file mode 100644
index 0000000..199c7a1
--- /dev/null
+++ b/markdown/pxf/HivePXF.html.md.erb
@@ -0,0 +1,700 @@
+---
+title: Accessing Hive Data
+---
+
+Apache Hive is a distributed data warehousing infrastructure.  Hive 
facilitates managing large data sets supporting multiple data formats, 
including comma-separated value (.csv), RC, ORC, and parquet. The PXF Hive 
plug-in reads data stored in Hive, as well as HDFS or HBase.
+
+This section describes how to use PXF to access Hive data. Options for 
querying data stored in Hive include:
+
+-  Creating an external table in PXF and querying that table
+-  Querying Hive tables via PXF's integration with HCatalog
+
+## <a id="installingthepxfhiveplugin"></a>Prerequisites
+
+Before accessing Hive data with HAWQ and PXF, ensure that:
+
+-   The PXF HDFS plug-in is installed on all cluster nodes. See [Installing 
PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information.
+-   The PXF Hive plug-in is installed on all cluster nodes.
+-   The Hive JAR files and conf directoryÂ are installed on all cluster nodes.
+-   You have tested PXF on HDFS.
+-   You are running the Hive Metastore service on a machine in your cluster.Â 
+-   You have set the `hive.metastore.uris`Â property in theÂ `hive-site.xml` 
on the NameNode.
+
+## <a id="topic_p2s_lvl_25"></a>Hive File Formats
+
+The PXF Hive plug-in supports several file formats and profiles for accessing 
these formats:
+
+| File Format  | Description | Profile |
+|-------|---------------------------|-------|
+| TextFile | Flat file with data in comma-, tab-, or space-separated value 
format or JSON notation. | Hive, HiveText |
+| SequenceFile | Flat file consisting of binary key/value pairs. | Hive |
+| RCFile | Record columnar data consisting of binary key/value pairs; high row 
compression rate. | Hive, HiveRC |
+| ORCFile | Optimized row columnar data with stripe, footer, and postscript 
sections; reduces data size. | Hive |
+| Parquet | Compressed columnar data representation. | Hive |
+| Avro | JSON-defined, schema-based data serialization format. | Hive |
+
+Refer to [File 
Formats](https://cwiki.apache.org/confluence/display/Hive/FileFormats) for 
detailed information about the file formats supported by Hive.
+
+## <a id="topic_p2s_lvl_29"></a>Data Type Mapping
+
+### <a id="hive_primdatatypes"></a>Primitive Data Types
+
+To represent Hive data in HAWQ, map data values that use a primitive data type 
to HAWQ columns of the same type.
+
+The following table summarizes external mapping rules for Hive primitive types.
+
+| Hive Data Type  | Hawq Data Type |
+|-------|---------------------------|
+| boolean    | bool |
+| int   | int4 |
+| smallint   | int2 |
+| tinyint   | int2 |
+| bigint   | int8 |
+| float   | float4 |
+| double   | float8 |
+| string   | text |
+| binary   | bytea |
+| timestamp   | timestamp |
+
+
+### <a id="topic_b4v_g3n_25"></a>Complex Data Types
+
+Hive supports complex data types including array, struct, map, and union. PXF 
maps each of these complex types to `text`.  While HAWQ does not natively 
support these types, you can create HAWQ functions or application code to 
extract subcomponents of these complex data types.
+
+An example using complex data types is provided later in this topic.
+
+
+## <a id="hive_sampledataset"></a>Sample Data Set
+
+Examples used in this topic will operate on a common data set. This simple 
data set models a retail sales operation and includes fields with the following 
names and data types:
+
+| Field Name  | Data Type |
+|-------|---------------------------|
+| location | text |
+| month | text |
+| number\_of\_orders | integer |
+| total\_sales | double |
+
+Prepare the sample data set for use:
+
+1. First, create a text file:
+
+    ```
+    $ vi /tmp/pxf_hive_datafile.txt
+    ```
+
+2. Add the following data to `pxf_hive_datafile.txt`; notice the use of the 
comma `,` to separate the four field values:
+
+    ```
+    Prague,Jan,101,4875.33
+    Rome,Mar,87,1557.39
+    Bangalore,May,317,8936.99
+    Beijing,Jul,411,11600.67
+    San Francisco,Sept,156,6846.34
+    Paris,Nov,159,7134.56
+    San Francisco,Jan,113,5397.89
+    Prague,Dec,333,9894.77
+    Bangalore,Jul,271,8320.55
+    Beijing,Dec,100,4248.41
+    ```
+
+Make note of the path to `pxf_hive_datafile.txt`; you will use it in later 
exercises.
+
+
+## <a id="hivecommandline"></a>Hive Command Line
+
+The Hive command line is a subsystem similar to that of `psql`. To start the 
Hive command line:
+
+``` shell
+$ HADOOP_USER_NAME=hdfs hive
+```
+
+The default Hive database is named `default`. 
+
+### <a id="hivecommandline_createdb"></a>Example: Create a Hive Database
+
+Create a Hive table to expose our sample data set.
+
+1. Create a Hive table named `sales_info` in the `default` database:
+
+    ``` sql
+    hive> CREATE TABLE sales_info (location string, month string,
+            number_of_orders int, total_sales double)
+            ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
+            STORED AS textfile;
+    ```
+
+    Notice that:
+    - The `STORED AS textfile` subclause instructs Hive to create the table in 
Textfile (the default) format.  Hive Textfile format supports comma-, tab-, and 
space-separated values, as well as data specified in JSON notation.
+    - The `DELIMITED FIELDS TERMINATED BY` subclause identifies the field 
delimiter within a data record (line). The `sales_info` table field delimiter 
is a comma (`,`).
+
+2. Load the `pxf_hive_datafile.txt` sample data file into the `sales_info` 
table you just created:
+
+    ``` sql
+    hive> LOAD DATA LOCAL INPATH '/tmp/pxf_hive_datafile.txt'
+            INTO TABLE sales_info;
+    ```
+
+3. Perform a query on `sales_info` to verify that the data was loaded 
successfully:
+
+    ``` sql
+    hive> SELECT * FROM sales_info;
+    ```
+
+In examples later in this section, you will access the `sales_info` Hive table 
directly via PXF. You will also insert `sales_info` data into tables of other 
Hive file format types, and use PXF to access those directly as well.
+
+## <a id="topic_p2s_lvl_28"></a>Querying External Hive Data
+
+The PXF Hive plug-in supports several Hive-related profiles. These include 
`Hive`, `HiveText`, and `HiveRC`.
+
+Use the following syntax to create a HAWQ external table representing Hive 
data:
+
+``` sql
+CREATE EXTERNAL TABLE <table_name>
+    ( <column_name> <data_type> [, ...] | LIKE <other_table> )
+LOCATION ('pxf://<host>[:<port>]/<hive-db-name>.<hive-table-name>
+    ?PROFILE=Hive|HiveText|HiveRC[&DELIMITER=<delim>'])
+FORMAT 'CUSTOM|TEXT' (formatter='pxfwritable_import' | delimiter='<delim>')
+```
+
+Hive-plug-in-specific keywords and values used in the [CREATE EXTERNAL 
TABLE](../reference/sql/CREATE-EXTERNAL-TABLE.html) call are described below.
+
+| Keyword  | Value |
+|-------|-------------------------------------|
+| \<host\>[:<port\>]    | The HDFS NameNode and port. |
+| \<hive-db-name\>    | The name of the Hive database. If omitted, defaults to 
the Hive database named `default`. |
+| \<hive-table-name\>    | The name of the Hive table. |
+| PROFILE    | The `PROFILE` keyword must specify one of the values `Hive`, 
`HiveText`, or `HiveRC`. |
+| DELIMITER    | The `DELIMITER` clause is required for both the `HiveText` 
and `HiveRC` profiles and identifies the field delimiter used in the Hive data 
set.  \<delim\> must be a single ascii character or specified in hexadecimal 
representation. |
+| FORMAT (`Hive` profile)   | The `FORMAT` clause must specify `CUSTOM`. The 
`CUSTOM` format supports only the built-in `pxfwritable_import` `formatter`.   |
+| FORMAT (`HiveText` and `HiveRC` profiles) | The `FORMAT` clause must specify 
`TEXT`. The `delimiter` must be specified a second time in '\<delim\>'. |
+
+
+## <a id="profile_hive"></a>Hive Profile
+
+The `Hive` profile works with any Hive file format. It can access heterogenous 
format data in a single table where each partition may be stored as a different 
file format.
+
+While you can use the `Hive` profile to access any file format, the more 
specific profiles perform better for those single file format types.
+
+
+### <a id="profile_hive_using"></a>Example: Using the Hive Profile
+
+Use the `Hive` profile to create a queryable HAWQ external table from the Hive 
`sales_info` textfile format table created earlier.
+
+1. Create a queryable HAWQ external table from the Hive `sales_info` textfile 
format table created earlier:
+
+    ``` sql
+    postgres=# CREATE EXTERNAL TABLE salesinfo_hiveprofile(location text, 
month text, num_orders int, total_sales float8)
+                LOCATION 
('pxf://namenode:51200/default.sales_info?PROFILE=Hive')
+              FORMAT 'custom' (formatter='pxfwritable_import');
+    ```
+
+2. Query the table:
+
+    ``` sql
+    postgres=# SELECT * FROM salesinfo_hiveprofile;
+    ```
+
+    ``` shell
+       location    | month | num_orders | total_sales
+    ---------------+-------+------------+-------------
+     Prague        | Jan   |        101 |     4875.33
+     Rome          | Mar   |         87 |     1557.39
+     Bangalore     | May   |        317 |     8936.99
+     ...
+
+    ```
+
+## <a id="profile_hivetext"></a>HiveText Profile
+
+Use the `HiveText` profile to query text format files. The `HiveText` profile 
is more performant than the `Hive` profile.
+
+**Note**: When using the `HiveText` profile, you *must* specify a delimiter 
option in *both* the `LOCATION` and `FORMAT` clauses.
+
+### <a id="profile_hivetext_using"></a>Example: Using the HiveText Profile
+
+Use the PXF `HiveText` profile to create a queryable HAWQ external table from 
the Hive `sales_info` textfile format table created earlier.
+
+1. Create the external table:
+
+    ``` sql
+    postgres=# CREATE EXTERNAL TABLE salesinfo_hivetextprofile(location text, 
month text, num_orders int, total_sales float8)
+                 LOCATION 
('pxf://namenode:51200/default.sales_info?PROFILE=HiveText&DELIMITER=\x2c')
+               FORMAT 'TEXT' (delimiter=E',');
+    ```
+
+    (You can safely ignore the "nonstandard use of escape in a string literal" 
warning and related messages.)
+
+    Notice that:
+    - The `LOCATION` subclause `DELIMITER` value is specified in hexadecimal 
format. `\x` is a prefix that instructs PXF to interpret the following 
characters as hexadecimal. `2c` is the hex value for the comma character.
+    - The `FORMAT` subclause `delimiter` value is specified as the single 
ascii comma character `','`. `E` escapes the character.
+
+2. Query the external table:
+
+    ``` sql
+    postgres=# SELECT * FROM salesinfo_hivetextprofile WHERE 
location="Beijing";
+    ```
+
+    ``` shell
+     location | month | num_orders | total_sales
+    ----------+-------+------------+-------------
+     Beijing  | Jul   |        411 |    11600.67
+     Beijing  | Dec   |        100 |     4248.41
+    (2 rows)
+    ```
+
+## <a id="profile_hiverc"></a>HiveRC Profile
+
+The RCFile Hive format is used for row columnar formatted data. The `HiveRC` 
profile provides access to RCFile data.
+
+### <a id="profile_hiverc_rcfiletbl_using"></a>Example: Using the HiveRC 
Profile
+
+Use the `HiveRC` profile to query RCFile-formatted data in Hive tables. The 
`HiveRC` profile is more performant than the `Hive` profile for this file 
format type.
+
+1. Create a Hive table with RCFile format:
+
+    ``` shell
+    $ HADOOP_USER_NAME=hdfs hive
+    ```
+
+    ``` sql
+    hive> CREATE TABLE sales_info_rcfile (location string, month string,
+            number_of_orders int, total_sales double)
+          ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
+          STORED AS rcfile;
+    ```
+
+2. Insert the data from the `sales_info` table into `sales_info_rcfile`:
+
+    ``` sql
+    hive> INSERT INTO TABLE sales_info_rcfile SELECT * FROM sales_info;
+    ```
+
+    A copy of the sample data set is now stored in RCFile format in 
`sales_info_rcfile`. 
+    
+3. Perform a Hive query on `sales_info_rcfile` to verify that the data was 
loaded successfully:
+
+    ``` sql
+    hive> SELECT * FROM sales_info_rcfile;
+    ```
+
+4. Use the PXF `HiveRC` profile to create a queryable HAWQ external table from 
the Hive `sales_info_rcfile` table created in the previous step. When using the 
`HiveRC` profile, you **must** specify a delimiter option in *both* the 
`LOCATION` and `FORMAT` clauses.:
+
+    ``` sql
+    postgres=# CREATE EXTERNAL TABLE salesinfo_hivercprofile(location text, 
month text, num_orders int, total_sales float8)
+                 LOCATION 
('pxf://namenode:51200/default.sales_info_rcfile?PROFILE=HiveRC&DELIMITER=\x2c')
+               FORMAT 'TEXT' (delimiter=E',');
+    ```
+
+    (Again, you can safely ignore the "nonstandard use of escape in a string 
literal" warning and related messages.)
+
+5. Query the external table:
+
+    ``` sql
+    postgres=# SELECT location, total_sales FROM salesinfo_hivercprofile;
+    ```
+
+    ``` shell
+       location    | total_sales
+    ---------------+-------------
+     Prague        |     4875.33
+     Rome          |     1557.39
+     Bangalore     |     8936.99
+     Beijing       |    11600.67
+     ...
+    ```
+
+## <a id="topic_dbb_nz3_ts"></a>Accessing Parquet-Format Hive Tables
+
+The PXF `Hive` profile supports both non-partitioned and partitioned Hive 
tables that use the Parquet storage format in HDFS. Simply map the table 
columns using equivalent HAWQ data types. For example, if a Hive table is 
created using:
+
+``` sql
+hive> CREATE TABLE hive_parquet_table (fname string, lname string, custid int, 
acctbalance double)
+        STORED AS parquet;
+```
+
+Define the HAWQ external table using:
+
+``` sql
+postgres=# CREATE EXTERNAL TABLE pxf_parquet_table (fname text, lname text, 
custid int, acctbalance double precision)
+    LOCATION 
('pxf://namenode:51200/hive-db-name.hive_parquet_table?profile=Hive')
+    FORMAT 'CUSTOM' (formatter='pxfwritable_import');
+```
+
+And query the HAWQ external table using:
+
+``` sql
+postgres=# SELECT fname,lname FROM pxf_parquet_table;
+```
+
+
+## <a id="profileperf"></a>Profile Performance Considerations
+
+The `HiveRC` and `HiveText` profiles are faster than the generic `Hive` 
profile.
+
+
+## <a id="complex_dt_example"></a>Complex Data Type Example
+
+This example will employ the array and map complex types, specifically an 
array of integers and a string key/value pair map.
+
+The data schema for this example includes fields with the following names and 
data types:
+
+| Field Name  | Data Type |
+|-------|---------------------------|
+| index | int |
+| name | string
+| intarray | array of integers |
+| propmap | map of string key and value pairs |
+
+When specifying an array field in a Hive table, you must identify the 
terminator for each item in the collection. Similarly, the map key termination 
character must also be specified.
+
+1. Create a text file from which you will load the data set:
+
+    ```
+    $ vi /tmp/pxf_hive_complex.txt
+    ```
+
+2. Add the following data to `pxf_hive_complex.txt`.  The data uses a comma 
`,` to separate field values, the percent symbol `%` to separate collection 
items, and a `:` to terminate map key values:
+
+    ```
+    3,Prague,1%2%3,zone:euro%status:up
+    89,Rome,4%5%6,zone:euro
+    400,Bangalore,7%8%9,zone:apac%status:pending
+    183,Beijing,0%1%2,zone:apac
+    94,Sacramento,3%4%5,zone:noam%status:down
+    101,Paris,6%7%8,zone:euro%status:up
+    56,Frankfurt,9%0%1,zone:euro
+    202,Jakarta,2%3%4,zone:apac%status:up
+    313,Sydney,5%6%7,zone:apac%status:pending
+    76,Atlanta,8%9%0,zone:noam%status:down
+    ```
+
+3. Create a Hive table to represent this data:
+
+    ``` shell
+    $ HADOOP_USER_NAME=hdfs hive
+    ```
+
+    ``` sql
+    hive> CREATE TABLE table_complextypes( index int, name string, intarray 
ARRAY<int>, propmap MAP<string, string>)
+             ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
+             COLLECTION ITEMS TERMINATED BY '%'
+             MAP KEYS TERMINATED BY ':'
+             STORED AS TEXTFILE;
+    ```
+
+    Notice that:
+    - `FIELDS TERMINATED BY` identifies a comma as the field terminator.
+    - The `COLLECTION ITEMS TERMINATED BY` subclause specifies the percent 
sign as the collection items (array item, map key/value pair) terminator.
+    - `MAP KEYS TERMINATED BY` identifies a colon as the terminator for map 
keys.
+
+4. Load the `pxf_hive_complex.txt` sample data file into the 
`table_complextypes` table you just created:
+
+    ``` sql
+    hive> LOAD DATA LOCAL INPATH '/tmp/pxf_hive_complex.txt' INTO TABLE 
table_complextypes;
+    ```
+
+5. Perform a query on Hive table `table_complextypes` to verify that the data 
was loaded successfully:
+
+    ``` sql
+    hive> SELECT * FROM table_complextypes;
+    ```
+
+    ``` shell
+    3  Prague  [1,2,3] {"zone":"euro","status":"up"}
+    89 Rome    [4,5,6] {"zone":"euro"}
+    400        Bangalore       [7,8,9] {"zone":"apac","status":"pending"}
+    ...
+    ```
+
+6. Use the PXF `Hive` profile to create a queryable HAWQ external table 
representing the Hive `table_complextypes`:
+
+    ``` sql
+    postgres=# CREATE EXTERNAL TABLE complextypes_hiveprofile(index int, name 
text, intarray text, propmap text)
+                 LOCATION 
('pxf://namenode:51200/table_complextypes?PROFILE=Hive')
+               FORMAT 'CUSTOM' (formatter='pxfwritable_import');
+    ```
+
+    Notice that the integer array and map complex types are mapped to type 
text.
+
+7. Query the external table:
+
+    ``` sql
+    postgres=# SELECT * FROM complextypes_hiveprofile;
+    ```
+
+    ``` shell     
+     index |    name    | intarray |              propmap
+    -------+------------+----------+------------------------------------
+         3 | Prague     | [1,2,3]  | {"zone":"euro","status":"up"}
+        89 | Rome       | [4,5,6]  | {"zone":"euro"}
+       400 | Bangalore  | [7,8,9]  | {"zone":"apac","status":"pending"}
+       183 | Beijing    | [0,1,2]  | {"zone":"apac"}
+        94 | Sacramento | [3,4,5]  | {"zone":"noam","status":"down"}
+       101 | Paris      | [6,7,8]  | {"zone":"euro","status":"up"}
+        56 | Frankfurt  | [9,0,1]  | {"zone":"euro"}
+       202 | Jakarta    | [2,3,4]  | {"zone":"apac","status":"up"}
+       313 | Sydney     | [5,6,7]  | {"zone":"apac","status":"pending"}
+        76 | Atlanta    | [8,9,0]  | {"zone":"noam","status":"down"}
+    (10 rows)
+    ```
+
+    `intarray` and `propmap` are each text strings.
+
+## <a id="hcatalog"></a>Using PXF and HCatalog to Query Hive
+
+Hive tables can be queried directly through HCatalog integration with HAWQ and 
PXF, regardless of the underlying file storage format.
+
+In previous sections, you created an external table in PXF that described the 
target table's Hive metadata. Another option for querying Hive tables is to 
take advantage of HAWQ's integration with HCatalog. This integration allows 
HAWQ to directly use table metadata stored in HCatalog.
+
+HCatalog is built on top of the Hive metastore and incorporates Hive's DDL. 
This provides several advantages:
+
+-   You do not need to know the table schema of your Hive tables
+-   You do not need to manually enter information about Hive table location or 
format
+-   If Hive table metadata changes, HCatalog provides updated metadata. This 
is in contrast to the use of static external PXF tables to define Hive table 
metadata for HAWQ.
+
+The following diagram depicts how HAWQ integrates with HCatalog to query Hive 
tables:
+
+<img src="../images/hawq_hcatalog.png" id="hcatalog__image_ukw_h2v_c5" 
class="image" width="672" />
+
+1.  HAWQ retrieves table metadata from HCatalog using PXF.
+2.  HAWQ creates in-memory catalog tables from the retrieved metadata. If a 
table is referenced multiple times in a transaction, HAWQ uses its in-memory 
metadata to reduce external calls to HCatalog.
+3.  PXF queries Hive using table metadata that is stored in the HAWQ in-memory 
catalog tables. Table metadata is dropped at the end of the transaction.
+
+
+### <a id="topic_j1l_enabling"></a>Enabling HCatalog Integration
+
+To enable HCatalog query integration in HAWQ, perform the following steps:
+
+1.  Make sure your deployment meets the requirements listed in 
[Prerequisites](#installingthepxfhiveplugin).
+2.  If necessary, set the `pxf_service_address` global configuration property 
to the hostname or IP address and port where you have installed the PXF Hive 
plug-in. By default, the value is set to `localhost:51200`.
+
+    ``` sql
+    postgres=# SET pxf_service_address TO <hivenode>:51200
+    ```
+
+3.  HCatalog internally uses the `pxf` protocol to query.  Grant this protocol 
privilege to all roles requiring access:
+
+    ``` sql
+    postgres=# GRANT ALL ON PROTOCOL pxf TO <role>;
+    ```
+
+4. It is not recommended to create a HAWQ table using the `WITH (OIDS)` 
clause. If any user tables were created using the `WITH (OIDS)` clause, 
additional operations are required to enable HCatalog integration. To access a 
Hive table via HCatalog when user tables were created using `WITH (OIDS)`, HAWQ 
users must have `SELECT` permission to query every user table within the same 
schema that was created using the `WITH (OIDS)` clause. 
+
+    1. Determine which user tables were created using the `WITH (OIDS)` clause:
+
+        ``` sql
+        postgres=# SELECT oid, relname FROM pg_class 
+                     WHERE relhasoids = true 
+                       AND relnamespace <> (SELECT oid FROM pg_namespace WHERE 
nspname = 'pg_catalog');
+        ```
+
+    2. Grant `SELECT` privilege on all returned tables to all roles to which 
you chose to provide HCatalog query access. For example:
+
+        ``` sql
+        postgres=# GRANT SELECT ON <table-created-WITH-OIDS> TO <role>
+        ``` 
+
+### <a id="topic_j1l_y55_c5"></a>Usage    
+
+To query a Hive table with HCatalog integration, query HCatalog directly from 
HAWQ. The query syntax is:
+
+``` sql
+postgres=# SELECT * FROM hcatalog.hive-db-name.hive-table-name;
+```
+
+For example:
+
+``` sql
+postgres=# SELECT * FROM hcatalog.default.sales_info;
+```
+
+To obtain a description of a Hive table with HCatalog integration, you can use 
the `psql` client interface.
+
+-   Within HAWQ, use either the `\d                                         
hcatalog.hive-db-name.hive-table-name` or `\d+                                  
       hcatalog.hive-db-name.hive-table-name` commands to describe a single 
table.  `\d` displays only HAWQ's interpretation of the underlying source (Hive 
in this case) data type, while `\d+` displays both the HAWQ interpreted and 
Hive source data types. For example, from the `psql` client interface:
+
+    ``` shell
+    $ psql -d postgres
+    ```
+
+    ``` sql
+    postgres=# \d+ hcatalog.default.sales_info_rcfile;
+    ```
+
+    ``` shell
+    PXF Hive Table "default.sales_info_rcfile"
+          Column      |  Type  | Source type 
+    ------------------+--------+-------------
+     location         | text   | string
+     month            | text   | string
+     number_of_orders | int4   | int
+     total_sales      | float8 | double
+    ```
+-   Use `\d hcatalog.hive-db-name.*` to describe the whole database schema, 
i.e. all tables in `hive-db-name`.
+-   Use `\d hcatalog.*.*` to describe the whole schema, i.e. all databases and 
tables.
+
+When using `\d` or `\d+` commands in the `psql` HAWQ client, `hcatalog` will 
not be listed as a database. If you use other `psql` compatible clients, 
`hcatalog` will be listed as a database with a size value of `-1` since 
`hcatalog` is not a real database in HAWQ.
+
+Alternatively, you can use the `pxf_get_item_fields` user-defined function 
(UDF) to obtain Hive table descriptions from other client interfaces or 
third-party applications. The UDF takes a PXF profile and a table pattern 
string as its input parameters.  **Note:** The only supported input profile at 
this time is `'Hive'`.
+
+- The following statement returns a description of a specific table. The 
description includes path, itemname (table), fieldname, and fieldtype.
+
+    ``` sql
+    postgres=# SELECT * FROM 
pxf_get_item_fields('Hive','default.sales_info_rcfile');
+    ```
+
+    ``` pre
+      path   |     itemname      |    fieldname     | fieldtype
+    ---------+-------------------+------------------+-----------
+     default | sales_info_rcfile | location         | text
+     default | sales_info_rcfile | month            | text
+     default | sales_info_rcfile | number_of_orders | int4
+     default | sales_info_rcfile | total_sales      | float8
+    ```
+
+- The following statement returns table descriptions from the default database.
+
+    ``` sql
+    postgres=# SELECT * FROM pxf_get_item_fields('Hive','default.*');
+    ```
+
+- The following statement returns a description of the entire schema.
+
+    ``` sql
+    postgres=# SELECT * FROM pxf_get_item_fields('Hive', '*.*');
+    ```
+
+### <a id="topic_r5k_pst_25"></a>Limitations
+
+HCatalog integration has the following limitations:
+
+-   HCatalog integration queries and describe commands do not support complex 
types; only primitive types are supported. Use PXF external tables to query 
complex types in Hive. (See [Complex Types Example](#complex_dt_example).)
+-   Even for primitive types, HCatalog metadata descriptions produced by `\d` 
are HAWQ's interpretation of the underlying Hive data types. For example, the 
Hive type `tinyint` is converted to HAWQ type `int2`. (See [Data Type 
Mapping](#hive_primdatatypes).)
+-   HAWQ reserves the database name `hcatalog` for system use. You cannot 
connect to or alter the system `hcatalog` database.
+
+## <a id="partitionfiltering"></a>Partition Filtering
+
+The PXF Hive plug-in supports the Hive partitioning feature and directory 
structure. This enables partition exclusion on selected HDFS files comprising 
the Hive table.Â To useÂ the partition filteringÂ feature to reduce network 
traffic and I/O, run a PXF query using a `WHERE` clauseÂ that refers to a 
specific partition in the partitioned Hive table.
+
+To take advantage of PXF partition filtering push-down, the Hive and PXF 
partition field names should be the same. Otherwise, PXF ignores partition 
filtering and the filtering is performed on the HAWQ side, impactingÂ 
performance.
+
+**Note:** The Hive plug-in filters only on partition columns, not on other 
table attributes.
+
+### <a id="partitionfiltering_pushdowncfg"></a>Configure Partition Filtering 
Push-Down
+
+PXF partition filtering push-down is enabled by default. To disable PXF 
partition filtering push-down, set the `pxf_enable_filter_pushdown` HAWQ server 
configuration parameter to `off`:
+
+``` sql
+postgres=# SHOW pxf_enable_filter_pushdown;
+ pxf_enable_filter_pushdown
+-----------------------------
+ on
+(1 row)
+postgres=# SET pxf_enable_filter_pushdown=off;
+```
+
+### <a id="example2"></a>Create Partitioned Hive Table
+
+Create aÂ Hive table `sales_part`Â with two partition columns, 
`delivery_state` and `delivery_city:`
+
+``` sql
+hive> CREATE TABLE sales_part (name string, type string, supplier_key int, 
price double)
+        PARTITIONED BY (delivery_state string, delivery_city string)
+        ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
+```
+
+Load data into this Hive table andÂ add some partitions:
+
+``` sql
+hive> INSERT INTO TABLE sales_part 
+        PARTITION(delivery_state = 'CALIFORNIA', delivery_city = 'Fresno') 
+        VALUES ('block', 'widget', 33, 15.17);
+hive> INSERT INTO TABLE sales_part 
+        PARTITION(delivery_state = 'CALIFORNIA', delivery_city = 'Sacramento') 
+        VALUES ('cube', 'widget', 11, 1.17);
+hive> INSERT INTO TABLE sales_part 
+        PARTITION(delivery_state = 'NEVADA', delivery_city = 'Reno') 
+        VALUES ('dowel', 'widget', 51, 31.82);
+hive> INSERT INTO TABLE sales_part 
+        PARTITION(delivery_state = 'NEVADA', delivery_city = 'Las Vegas') 
+        VALUES ('px49', 'pipe', 52, 99.82);
+```
+
+The Hive storage directory structure for the `sales_part` table appears as 
follows:
+
+``` pre
+$ sudo -u hdfs hdfs dfs -ls -R /apps/hive/warehouse/sales_part
+/apps/hive/warehouse/sales_part/delivery_state=CALIFORNIA/delivery_city=âFresnoâ/
+/apps/hive/warehouse/sales_part/delivery_state=CALIFORNIA/delivery_city=Sacramento/
+/apps/hive/warehouse/sales_part/delivery_state=NEVADA/delivery_city=Reno/
+/apps/hive/warehouse/sales_part/delivery_state=NEVADA/delivery_city=âLas 
Vegasâ/
+```
+
+To define a HAWQ PXF table that will read this Hive tableÂ andÂ take advantage 
of partition filter push-down, define the fields corresponding to the Hive 
partition fields at the end of the `CREATE EXTERNAL TABLE` attribute list.Â In 
HiveQL,Â aÂ `SELECT *`Â statement on a partitioned table shows the partition 
fields at the end of the record.
+
+``` sql
+postgres=# CREATE EXTERNAL TABLE pxf_sales_part(
+  item_name TEXT, 
+  item_type TEXT, 
+  supplier_key INTEGER, 
+  item_price DOUBLE PRECISION, 
+  delivery_state TEXT, 
+  delivery_city TEXT
+)
+LOCATION ('pxf://namenode:51200/sales_part?Profile=Hive')
+FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
+
+postgres=# SELECT * FROM pxf_sales_part;
+```
+
+### <a id="example3"></a>Query Without Pushdown
+
+In the following example, the HAWQ query filters the `delivery_city` partition 
`Sacramento`. The filter onÂ  `item_name` is not pushed down, since it is not a 
partition column. It is performed on the HAWQ side after all the data on 
`Sacramento` is transferred for processing.
+
+``` sql
+postgres=# SELECT * FROM pxf_sales_part WHERE delivery_city = 'Sacramento' AND 
item_name = 'cube';
+```
+
+### <a id="example4"></a>Query With Pushdown
+
+The following HAWQ query reads all the data underÂ `delivery_state` partition 
`CALIFORNIA`, regardless of the city.
+
+``` sql
+postgres=# SET pxf_enable_filter_pushdown=on;
+postgres=# SELECT * FROM pxf_sales_part WHERE delivery_state = 'CALIFORNIA';
+```
+
+## <a id="topic_fdm_zrh_1s"></a>Using PXF with Hive Default Partitions
+
+This topic describes a difference in query results between Hive and PXF 
queries when Hive tables use a default partition. When dynamic partitioning is 
enabled in Hive, a partitioned table may store data in a default partition. 
Hive creates a default partition when the value of a partitioning column does 
not match the defined type of the column (for example, when a NULL value is 
used for any partitioning column). In Hive, any query that includes a filter on 
a partition column *excludes* any data that is stored in the table's default 
partition.
+
+Similar to Hive, PXF represents a table's partitioning columns as columns that 
are appended to the end of the table. However, PXF translates any column value 
in a default partition to a NULL value. This means that a HAWQ query that 
includes an IS NULL filter on a partitioning column can return different 
results than the same Hive query.
+
+Consider a Hive partitioned table that is created with the statement:
+
+``` sql
+hive> CREATE TABLE sales (order_id bigint, order_amount float) PARTITIONED BY 
(xdate date);
+```
+
+The table is loaded with five rows that contain the following data:
+
+``` pre
+1.0    1900-01-01
+2.2    1994-04-14
+3.3    2011-03-31
+4.5    NULL
+5.0    2013-12-06
+```
+
+The insertion of row 4 creates a Hive default partition, because the partition 
column `xdate` contains a null value.
+
+In Hive, any query that filters on the partition column omits data in the 
default partition. For example, the following query returns no rows:
+
+``` sql
+hive> SELECT * FROM sales WHERE xdate is null;
+```
+
+However, if you map this table as a PXF external table in HAWQ, all default 
partition values are translated into actual NULL values. In HAWQ, executing the 
same query against the PXF table returns row 4 as the result, because the 
filter matches the NULL value.
+
+Keep this behavior in mind when executing IS NULL queries on Hive partitioned 
tables.
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/pxf/InstallPXFPlugins.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/pxf/InstallPXFPlugins.html.md.erb 
b/markdown/pxf/InstallPXFPlugins.html.md.erb
new file mode 100644
index 0000000..4ae4101
--- /dev/null
+++ b/markdown/pxf/InstallPXFPlugins.html.md.erb
@@ -0,0 +1,81 @@
+---
+title: Installing PXF Plug-ins
+---
+
+This topic describes how to install the built-in PXF service plug-ins that are 
required to connect PXF to HDFS, Hive, HBase, and JSON. 
+
+**Note:** PXF requires that you run Tomcat on the host machine. Tomcat 
reserves ports 8005, 8080, and 8009. If you have configured Oozie JXM reporting 
on a host that will run PXF, make sure that the reporting service uses a port 
other than 8005. This helps to prevent port conflict errors from occurring when 
you start the PXF service.
+
+## <a id="directories_and_logs"></a>PXF Installation and Log File Directories
+
+Installing PXF plug-ins, regardless of method, creates directories and log 
files on each node receiving the plug-in installation:
+
+| Directory                      | Description                                 
                                                                                
                                                                                
                               |
+|--------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `/usr/lib/pxf`                 | PXF library location                        
                                                                                
                                                                                
                               |
+| `/etc/pxf/conf`                | PXF configuration directory. This directory 
contains the `pxf-public.classpath` and `pxf-private.classpath` configuration 
files. See [Setting up the Java 
Classpath](ConfigurePXF.html#settingupthejavaclasspath). |
+| `/var/pxf/pxf-service`         | PXF service instance location               
                                                                                
                                                                                
                               |
+| `/var/log/pxf` | This directory includes `pxf-service.log` and all 
Tomcat-related logs including `catalina.out`. Logs are owned by user:group 
`pxf`:`pxf`. Other users have read access.                                      
                                    |
+| `/var/run/pxf/catalina.pid`    | PXF Tomcat container PID location           
                                                                                
                                                                                
                               |
+
+
+## <a id="install_pxf_plug_ambari"></a>Installing PXF Using Ambari
+
+If you are using Ambari to install and manage your HAWQ cluster, you do *not* 
need to follow the manual installation steps in this topic. Installing using 
the Ambari web interface installs all of the necessary PXF plug-in components.
+
+## <a id="install_pxf_plug_cmdline"></a>Installing PXF from the Command Line
+
+Each PXF service plug-in resides in its own RPM.  You may have built these 
RPMs in the Apache HAWQ open source project repository (see [PXF Build 
Instructions](https://github.com/apache/incubator-hawq/blob/master/pxf/README.md)),
 or these RPMs may have been included in a commercial product download package.
+
+Perform the following steps on **_each_** node in your cluster to install PXF:
+
+1. Install the PXF software, including Apache, the PXF service, and all PXF 
plug-ins: HDFS, HBase, Hive, JSON:
+
+    ```shell
+    $ sudo yum install -y pxf
+    ```
+
+    Installing PXF in this manner:
+    * installs the required version of `apache-tomcat`
+    * creates a `/etc/pxf/pxf-n.n.n` directory, adding a softlink from 
`/etc/pxf` to this directory
+    * sets up the PXF service configuration files in `/etc/pxf`
+    * creates a `/usr/lib/pxf-n.n.n` directory, adding a softlink from 
`/usr/lib/pxf` to this directory
+    * copies the PXF service JAR file `pxf-service-n.n.n.jar` to 
`/usr/lib/pxf-n.n.n/`
+    * copies JAR files for each of the PXF plugs-ins to `/usr/lib/pxf-n.n.n/`
+    * creates softlinks from `pxf-xxx.jar` in `/usr/lib/pxf-n.n.n/`
+
+2. Initialize the PXF service:
+
+    ```shell
+    $ sudo service pxf-service init
+    ```
+
+2. Start the PXF service:
+
+    ```shell
+    $ sudo service pxf-service start
+    ```
+    
+    Additional `pxf-service` command options include `stop`, `restart`, and 
`status`.
+
+2. If you choose to use the HBase plug-in, perform the following configuration:
+
+    1. Add the PXF HBase plug-in JAR file to the HBase `CLASSPATH` by updating 
the `HBASE_CLASSPATH` environment variable setting in the HBase environment 
file `/etc/hbase/conf/hbase-env.sh`:
+
+        ``` shell
+        export HBASE_CLASSPATH=${HBASE_CLASSPATH}:/usr/lib/pxf/pxf-hbase.jar
+        ```
+
+    3. Restart the HBase service after making this update to HBase 
configuration.
+
+        On the HBase Master node:
+
+        ``` shell
+        $ su -l hbase -c "/usr/hdp/current/hbase-master/bin/hbase-daemon.sh 
restart master; sleep 25"
+       ```
+
+        On an HBase Region Server node:
+
+        ```shell
+        $ su -l hbase -c 
"/usr/hdp/current/hbase-regionserver/bin/hbase-daemon.sh restart regionserver"
+        ```

[30/51] [partial] incubator-hawq-docs git commit: HAWQ-1254 Fix/remove book branching on incubator-hawq-docs

Reply via email to