[1/6] incubator-hawq-docs git commit: Updates for hawq register

yozie Fri, 30 Sep 2016 16:48:11 -0700

Repository: incubator-hawq-docs
Updated Branches:
  refs/heads/develop 10cde80c9 -> 81b1ef862



Updates for hawq register


Project: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/repo
Commit: 
http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/commit/deb1c4b5
Tree: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/tree/deb1c4b5
Diff: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/diff/deb1c4b5

Branch: refs/heads/develop
Commit: deb1c4b5b26cba9691c27ddb86ca4b980fdf0a00
Parents: 29a7f42
Author: Jane Beckman <[email protected]>
Authored: Wed Sep 28 10:44:07 2016 -0700
Committer: Jane Beckman <[email protected]>
Committed: Wed Sep 28 10:44:07 2016 -0700

----------------------------------------------------------------------
 .../load/g-loading-and-unloading-data.html.md   |  58 +++++
 datamgmt/load/g-register_files.html.md.erb      | 214 +++++++++++++++++++
 .../admin_utilities/hawqregister.html.md.erb    | 175 ++++++++++-----
 3 files changed, 390 insertions(+), 57 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/deb1c4b5/datamgmt/load/g-loading-and-unloading-data.html.md
----------------------------------------------------------------------
diff --git a/datamgmt/load/g-loading-and-unloading-data.html.md 
b/datamgmt/load/g-loading-and-unloading-data.html.md
new file mode 100644
index 0000000..012e5b6
--- /dev/null
+++ b/datamgmt/load/g-loading-and-unloading-data.html.md
@@ -0,0 +1,58 @@
+---
+title: Loading and Unloading Data
+---
+
+The topics in this section describe methods for loading and writing data into 
and out of HAWQ, and how to format data files. It also covers registering HDFS 
files and folders directly into HAWQ internal tables.
+
+HAWQ supports high-performance parallel data loading and unloading, and for 
smaller amounts of data, single file, non-parallel data import and export.
+
+HAWQ can read from and write to several types of external data sources, 
including text files, Hadoop file systems, and web servers.
+
+-   The `COPY` SQL command transfers data between an external text file on the 
master host and a HAWQ database table.
+-   External tables allow you to query data outside of the database directly 
and in parallel using SQL commands such as `SELECT`, `JOIN`, or `SORT           
EXTERNAL TABLE DATA`, and you can create views for external tables. External 
tables are often used to load external data into a regular database table using 
a command such as `CREATE TABLE table AS SELECT * FROM ext_table`.
+-   External web tables provide access to dynamic data. They can be backed 
with data from URLs accessed using the HTTP protocol or by the output of an OS 
script running on one or more segments.
+-   The `gpfdist` utility is the HAWQ parallel file distribution program. It 
is an HTTP server that is used with external tables to allow HAWQ segments to 
load external data in parallel, from multiple file systems. You can run 
multiple instances of `gpfdist` on different hosts and network interfaces and 
access them in parallel.
+-   The `hawq load` utility automates the steps of a load task using a 
YAML-formatted control file.
+
+The method you choose to load data depends on the characteristics of the 
source dataâits location, size, format, and any transformations required.
+
+In the simplest case, the `COPY` SQL command loads data into a table from a 
text file that is accessible to the HAWQ master instance. This requires no 
setup and provides good performance for smaller amounts of data. With the 
`COPY` command, the data copied into or out of the database passes between a 
single file on the master host and the database. This limits the total size of 
the dataset to the capacity of the file system where the external file resides 
and limits the data transfer to a single file write stream.
+
+More efficient data loading options for large datasets take advantage of the 
HAWQ MPP architecture, using the HAWQ segments to load data in parallel. These 
methods allow data to load simultaneously from multiple file systems, through 
multiple NICs, on multiple hosts, achieving very high data transfer rates. 
External tables allow you to access external files from within the database as 
if they are regular database tables. When used with `gpfdist`, the HAWQ 
parallel file distribution program, external tables provide full parallelism by 
using the resources of all HAWQ segments to load or unload data.
+
+The `hawq register` utility allows you to:
+
+-  Load and register file data generated by an external system such as Hive or 
Spark into HAWQ internal tables.
+-  Recover cluster data from a backup cluster for disaster recovery, using a 
YAML file.
+
+HAWQ leverages the parallel architecture of the Hadoop Distributed File System 
(HDFS) to access files on that system.
+
+-   **[Working with File-Based External 
Tables](../../datamgmt/load/g-working-with-file-based-ext-tables.html)**
+
+-   **[Using the Greenplum Parallel File Server 
(gpfdist)](../../datamgmt/load/g-using-the-greenplum-parallel-file-server--gpfdist-.html)**
+
+-   **[Creating and Using Web External 
Tables](../../datamgmt/load/g-creating-and-using-web-external-tables.html)**
+
+-   **[Loading Data Using an External 
Table](../../datamgmt/load/g-loading-data-using-an-external-table.html)**
+
+-   **[Loading and Writing Non-HDFS Custom 
Data](../../datamgmt/load/g-loading-and-writing-non-hdfs-custom-data.html)**
+
+-   **[Creating External Tables - 
Examples](../../datamgmt/load/creating-external-tables-examples.html#topic44)**
+
+-   **[Handling Load Errors](../../datamgmt/load/g-handling-load-errors.html)**
+
+-   **[Loading Data with hawq 
load](../../datamgmt/load/g-loading-data-with-hawqload.html)**
+
+-   **[Loading Data with 
COPY](../../datamgmt/load/g-loading-data-with-copy.html)**
+
+-   **[Running COPY in Single Row Error Isolation 
Mode](../../datamgmt/load/g-running-copy-in-single-row-error-isolation-mode.html)**
+
+-   **[Optimizing Data Load and Query 
Performance](../../datamgmt/load/g-optimizing-data-load-and-query-performance.html)**
+
+-   **[Unloading Data from 
HAWQ](../../datamgmt/load/g-unloading-data-from-greenplum-database.html)**
+
+-   **[Transforming XML 
Data](../../datamgmt/load/g-transforming-xml-data.html)**
+
+-   **[Formatting Data 
Files](../../datamgmt/load/g-formatting-data-files.html)**
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/deb1c4b5/datamgmt/load/g-register_files.html.md.erb
----------------------------------------------------------------------
diff --git a/datamgmt/load/g-register_files.html.md.erb 
b/datamgmt/load/g-register_files.html.md.erb
new file mode 100644
index 0000000..9abcbe2
--- /dev/null
+++ b/datamgmt/load/g-register_files.html.md.erb
@@ -0,0 +1,214 @@
+---
+title: Registering Files into HAWQ Internal Tables
+---
+
+The `hawq register` utility loads and registers HDFS data files or folders 
into HAWQ internal tables. Files can be read directly, rather than having to be 
copied or loaded, resulting in higher performance and more efficient 
transaction processing.
+
+Data from the file or directory specified by \<hdfsfilepath\> is loaded into 
the appropriate HAWQ table directory in HDFS and the utility updates the 
corresponding HAWQ metadata for the files. Either AO for Parquet-formatted in 
HDFS can be loaded into a corresponding table in HAWQ.
+
+You can use `hawq register` either to:
+
+-  Load and register external Parquet-formatted file data generated by an 
external system such as Hive or Spark.
+-  Recover cluster data from a backup cluster for disaster recovery. 
+
+Requirements for running `hawq register` on the client server are:
+
+-   Network access to and from all hosts in your HAWQ cluster (master and 
segments) and the hosts where the data to be loaded is located.
+-   The Hadoop client configured and the hdfs filepath specified.
+-   The files to be registered and the HAWQ table must be located in the same 
HDFS cluster.
+-   The target table DDL is configured with the correct data type mapping.
+
+##Registering Externally Generated HDFS File Data to an Existing Table<a 
id="topic1__section2"></a>
+
+Files or folders in HDFS can be registered into an existing table, allowing 
them to be managed as a HAWQ internal table. When registering files, you can 
optionally specify the maximum amount of data to be loaded, in bytes, using the 
`--eof` option. If registering a folder, the actual file sizes are used. 
+
+Only HAWQ or Hive-generated Parquet tables are supported. Partitioned tables 
are not supported. Attempting to register these tables will result in an error.
+
+Metadata for the Parquet file(s) and the destination table must be consistent. 
Different  data types are used by HAWQ tables and Parquet files, so data must 
be mapped. You must verify that the structure of the parquet files and the HAWQ 
table are compatible before running `hawq register`. 
+
+We recommand creating a copy of the Parquet file to be registered before 
running ```hawq register```
+You can then then run ```hawq register``` on the copy,  leaving the original 
file available for additional Hive queries or if a data mapping error is 
encountered.
+
+###Limitations for Registering Hive Tables to HAWQ
+The currently-supported data types for generating Hive tables into HAWQ tables 
are: boolean, int, smallint, tinyint, bigint, float, double, string, binary, 
char, and varchar.  
+
+The following HIVE data types cannot be converted to HAWQ equivalents: 
timestamp, decimal, array, struct, map, and union.   
+
+###Example: Registering a Hive-Generated Parquet File
+
+This example shows how to register a HIVE-generated parquet file in HDFS into 
the table `parquet_table` in HAWQ, which is in the database named `postgres`. 
The file path of the HIVE-generated file is 
`hdfs://localhost:8020/temp/hive.paq`.
+
+In this example, the location of the database is 
`hdfs://localhost:8020/hawq_default`, the tablespace id is 16385, the database 
id is 16387, the table filenode id is 77160, and the last file under the 
filenode is numbered 7.
+
+Enter:
+
+``` pre
+$ hawq register postgres -f hdfs://localhost:8020/temp/hive.paq parquet_table
+```
+
+After running the `hawq register` command for the file location  
`hdfs://localhost:8020/temp/hive.paq`, the corresponding new location of the 
file in HDFS is:  `hdfs://localhost:8020/hawq_default/16385/16387/77160/8`. 
+
+The command then updates the metadata of the table `parquet_table` in HAWQ, 
which is contained in the table `pg_aoseg.pg_paqseg_77160`. The pg\_aoseg is a 
fixed schema for row-oriented and Parquet AO tables. For row-oriented tables, 
the table name prefix is pg\_aoseg. The table name prefix for parquet tables is 
pg\_paqseg. 77160 is the relation id of the table.
+
+To locate the table, either find the relation ID by looking up the catalog 
table pg\_class in SQL by running 
+
+```
+select oid from pg_class where relname=$relname
+```
+or find the table name by using the SQL command 
+```
+select segrelid from pg_appendonly where relid = $relid
+```
+then running 
+```
+select relname from pg_class where oid = segrelid
+```
+
+##Registering Data Using Information from a .yml Configuration File<a 
id="topic1__section3"></a>
+ 
+The `hawq register` command can register HDFS files  by using metadata loaded 
from a .yml configuration file by using the `--config <yml_config\>` option. 
Both AO and Parquet tables can be registered. Tables need not exist in HAWQ 
before being registered. This function can be useful in disaster recovery, 
allowing information created by the `hawq extract` command to be used to 
re-create HAWQ tables.
+
+You can also use a .yml confguration file to append HDFS files to an existing 
HAWQ table or create a table and register it into HAWQ.
+
+For disaster recovery, tables can be re-registered using the HDFS files and a 
.yml file. The clusters are assumed to have data periodically imported from 
Cluster A to Cluster B. 
+
+Data is registered according to the following conditions: 
+
+-  Existing tables have files appended to the existing HAWQ table.
+-  If a table does not exist, it is created and registered into HAWQ. The 
catalog table will be updated with the file size specified by the .yml file.
+-  If the --force option is used, the data in existing catalog tables is 
erased and re-registered. All HDFS-related catalog contents in 
`pg_aoseg.pg_paqseg_$relid ` are cleared. The original files on HDFS are 
retained.
+-  If the --repair option is used, data is rolled back to a previous state, as 
specified in the .yml file. Any files generated after the checkpoint specified 
in the .yml file will be erased. Both the file on HDFS and its metadata are 
erased.
+
+Tables using random distribution are preferred for registering into HAWQ. If 
hash tables are to be  registered, the distribution policy in the .yml file 
must match that of the table being registered into. 
+
+In registering hash tables, the size of the registered file should be 
identical to or a multiple of the hash table bucket number. When registering 
hash distributed tables using a .yml file, the order of the files in the .yml 
file should reflect the hash distribution.
+
+
+###Example: Registration using a .yml Configuration File
+
+This example shows how to use hawq register to register HDFS data using a .yml 
configuration file generated by hawq extract. 
+
+First, create a table in SQL and insert some data into it.  
+
+```
+create table paq1(a int, b varchar(10))with(appendonly=true, 
orientation=parquet);
+```
+
+In SQL, run:
+
+```
+insert into paq1 values(generate_series(1,1000), 'abcde');
+```
+
+Go into the hawq administration utilities, and extract the table metadata by 
using the `hawq extract` utility.
+
+```
+hawq extract -o paq1.yml paq1
+```
+
+Register the data into new table paq2, using the --config option to identify 
the .yml file.
+
+```
+hawq register --config paq1.yml paq2
+```
+In SQL, select the new table and check to verify that  the content has been 
registered.
+
+```
+select count(*) from paq2;
+```
+
+
+##Data Type Mapping<a id="topic1__section4"></a>
+
+HAWQ and parquet tables and HIVE and HAWQ tables use different data types. 
Mapping must be used for metadata compatibility. You are responsible for making 
sure your implementation is mapped to the appropriate data type before running 
`hawq register`. The tables below show equivalent data types, if available.
+
+<span class="tablecap">Table 1. HAWQ to Parquet Mapping</span>
+
+|HAWQ Data Type   | Parquet Data Type  |
+| :------------| :---------------|
+| bool        | boolean       |
+| int2/int4/date        | int32       |
+| int8/money       | int64      |
+| time/timestamptz/timestamp       | int64      |
+| float4        | float       |
+|float8        | double       |
+|bit/varbit/bytea/numeric       | Byte array       |
+|char/bpchar/varchar/name| Byte array |
+| text/xml/interval/timetz  | Byte array  |
+| macaddr/inet/cidr  | Byte array  |
+
+**Additional HAWQ-to-Parquet Mapping**
+
+**point**:  
+
+``` 
+group {
+    required int x;
+    required int y;
+}
+```
+
+**circle:** 
+
+```
+group {
+    required int x;
+    required int y;
+    required int r;
+}
+```
+
+**box:**  
+
+```
+group {
+    required int x1;
+    required int y1;
+    required int x2;
+    required int y2;
+}
+```
+
+**iseg:** 
+
+
+```
+group {
+    required int x1;
+    required int y1;
+    required int x2;
+    required int y2;
+}
+``` 
+
+**path**:
+  
+```
+group {
+    repeated group {
+        required int x;
+        required int y;
+    }
+}
+```
+
+
+<span class="tablecap">Table 2. HIVE to HAWQ Mapping</span>
+
+|HIVE Data Type   | HAWQ Data Type  |
+| :------------| :---------------|
+| boolean        | bool       |
+| tinyint        | int2       |
+| smallint       | int2/smallint      |
+| int            | int4 / int |
+| bigint         | int8 / bigint      |
+| float        | float4       |
+| double       | float8 |
+| string        | varchar       |
+| binary      | bytea       |
+| char | char |
+| varchar  | varchar  |
+
+
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/deb1c4b5/reference/cli/admin_utilities/hawqregister.html.md.erb
----------------------------------------------------------------------
diff --git a/reference/cli/admin_utilities/hawqregister.html.md.erb 
b/reference/cli/admin_utilities/hawqregister.html.md.erb
index 1eeaa74..38b88aa 100644
--- a/reference/cli/admin_utilities/hawqregister.html.md.erb
+++ b/reference/cli/admin_utilities/hawqregister.html.md.erb
@@ -2,18 +2,29 @@
 title: hawq register
 ---
 
-Loads and registers external parquet-formatted data in HDFS into a 
corresponding table in HAWQ.
+Loads and registers 
+AO or Parquet-formatted data in HDFS into a corresponding table in HAWQ.
 
 ## <a id="topic1__section2"></a>Synopsis
 
 ``` pre
-hawq register <databasename> <tablename> <hdfspath> 
+Usage 1:
+hawq register [<connection_options>] [-f <hdfsfilepath>] [-e <Eof>] <tablename>
+
+Usage 2:
+hawq register [<connection_options>] [-c <configfilepath>][--force] <tablename>
+
+Connection Options:
      [-h <hostname>] 
      [-p <port>] 
      [-U <username>] 
      [-d <database>]
-     [-t <tablename>] 
+     
+Misc. Options:
      [-f <filepath>] 
+        [-e <eof>]
+        [--force] 
+        [--repair]
      [-c <yml_config>]  
 hawq register help | -? 
 hawq register --version
@@ -21,83 +32,54 @@ hawq register --version
 
 ## <a id="topic1__section3"></a>Prerequisites
 
-The client machine where `hawq register` is executed must have the following:
+The client machine where `hawq register` is executed must meet the following 
conditions:
 
 -   Network access to and from all hosts in your HAWQ cluster (master and 
segments) and the hosts where the data to be loaded is located.
+-   The Hadoop client must be configured and the hdfs filepath specified.
 -   The files to be registered and the HAWQ table located in the same HDFS 
cluster.
 -   The target table DDL is configured with the correct data type mapping.
 
 ## <a id="topic1__section4"></a>Description
 
-`hawq register` is a utility that loads and registers existing or external 
parquet data in HDFS into HAWQ, so that it can be directly ingested and 
accessed through HAWQ. Parquet data from the file or directory in the specified 
path is loaded into the appropriate HAWQ table directory in HDFS and the 
utility updates the corresponding HAWQ metadata for the files. 
+`hawq register` is a utility that loads and registers existing data files or 
folders in HDFS into HAWQ internal tables, allowing HAWQ to directly read the 
data and use internal table processing for operations such as transactions and 
high perforance, without needing to load or copy it. Data from the file or 
directory specified by \<hdfsfilepath\> is loaded into the appropriate HAWQ 
table directory in HDFS and the utility updates the corresponding HAWQ metadata 
for the files. 
 
-Only parquet tables can be loaded using the `hawq register` command. Metadata 
for the parquet file(s) and the destination table must be consistent. Different 
 data types are used by HAWQ tables and parquet tables, so the data is mapped. 
You must verify that the structure of the parquet files and the HAWQ table are 
compatible before running `hawq register`. 
+You can use `hawq register` to:
 
-Note: only HAWQ or HIVE-generated parquet tables are currently supported.
+-  Load and register external Parquet-formatted file data generated by an 
external system such as Hive or Spark.
+-  Recover cluster data from a backup cluster.
 
-###Limitations for Registering Hive Tables to HAWQ
-The currently-supported data types for generating Hive tables into HAWQ tables 
are: boolean, int, smallint, tinyint, bigint, float, double, string, binary, 
char, and varchar.  
+Two usage models are available.
 
-The following HIVE data types cannot be converted to HAWQ equivalents: 
timestamp, decimal, array, struct, map, and union.   
+###Usage Model 1: register file data to an existing table.
 
+`hawq register [-h hostname] [-p port] [-U username] [-d databasename] [-f 
filepath] [-e eof]<tablename>`
 
-## <a id="topic1__section5"></a>Options
-
-**General Options**
-
-<dt>-? (show help) </dt>  
-<dd>Show help, then exit.
-
-<dt>-\\\-version  </dt> 
-<dd>Show the version of this utility, then exit.</dd>
-
-
-**Connection Options**
-
-<dt>-h \<hostname\> </dt>
-<dd>Specifies the host name of the machine on which the HAWQ master database 
server is running. If not specified, reads from the environment variable 
`$PGHOST` or defaults to `localhost`.</dd>
-
-<dt> -p \<port\> </dt> 
-<dd>Specifies the TCP port on which the HAWQ master database server is 
listening for connections. If not specified, reads from the environment 
variable `$PGPORT` or defaults to 5432.</dd>
+Metadata for the Parquet file(s) and the destination table must be consistent. 
Different  data types are used by HAWQ tables and Parquet files, so the data is 
mapped. Refer to the section [Data Type 
Mapping](hawqregister.html#topic1__section7) below. You must verify that the 
structure of the Parquet files and the HAWQ table are compatible before running 
`hawq register`. 
 
-<dt>-U \<username\> </dt> 
-<dd>The database role name to connect as. If not specified, reads from the 
environment variable `$PGUSER` or defaults to the current system user name.</dd>
+####Limitations
+Only HAWQ or Hive-generated Parquet tables are supported.
+Hash tables and artitioned tables are not supported in this use model.
 
-<dt>-d  , --database \<databasename\>  </dt>
-<dd>The database to register the parquet HDFS data into. The default is 
`postgres`<dd>
+###Usage Model 2: Use information from a .yml configuration file to register 
data
  
-<dt>-t , --tablename \<tablename\> </dt>
-<dd>The HAWQ table that will store the parquet data. The table cannot use hash 
distribution: only tables using random distribution can be registered into 
HAWQ.</dd>
-
-<dt>-f , --filepath \<hdfspath\></dt>
-<dd>The path of the file or directory in HDFS containing the files to be 
registered.</dd>
-
-<dt>-c , --config \<yml_config\> </dt> 
-<dd>Registers a YAML-format configuration file into HAWQ.</dd>
-
-
+`hawq register [-h hostname] [-p port] [-U username] [-d databasename] [-c 
configfile] [--force] <tablename>`
 
-## <a id="topic1__section6"></a>Examples
+Files generated by the `hawq extract` command are registered through use of 
metadata in a .yml configuration file. Both AO and Parquet tables can be 
registered. Tables need not exist in HAWQ before being registered.
 
-This example shows how to register a HIVE-generated parquet file in HDFS into 
the table `parquet_table` in HAWQ, which is in the database named `postgres`. 
The file path of the HIVE-generated file is 
`hdfs://localhost:8020/temp/hive.paq`.
-
-For the purposes of this example, assume that the location of the database is 
`hdfs://localhost:8020/hawq_default`, the tablespace id is 16385, the database 
id is 16387, the table filenode id is 77160, and the last file under the 
filenode is numbered 7.
-
-Enter:
-
-``` pre
-$ hawq register postgres parquet_table hdfs://localhost:8020/temp/hive.paq
-```
+The register process behaves differently, according to different conditions. 
 
-After running the `hawq register` command for the file location  
`hdfs://localhost:8020/temp/hive.paq`, the corresponding new location of the 
file in HDFS is:  `hdfs://localhost:8020/hawq_default/16385/16387/77160/8`. The 
command then updates the metadata of the table `parquet_table` in HAWQ, which 
is contained in the table `pg_aoseg.pg_paqseg_77160`. The pg\_aoseg is a fixed 
schema for row-oriented and parquet ao tables. For row-oriented tables, table 
name prefix is pg\_aoseg. The table name prefix for parquet tables is 
pg\_paqseg. 77160 is the relation id of the table.
+-  Existing tables have files appended to the existing HAWQ table.
+-  If a table does not exist, it is created and registered into HAWQ. 
+-  If the -\-force option is used, the data in existing catalog tables is 
erased and re-registered.
 
-To locate the table, you can either find the relation ID by looking up the 
catalog table pg\_class by running `select oid from pg_class where 
relname=$relname` or by finding the table name by using the command `select 
segrelid from pg_appendonly where relid = $relid` then running `select relname 
from pg_class where oid = segrelid`.
+###Limitations for Registering Hive Tables to HAWQ
+The currently-supported data types for generating Hive tables into HAWQ tables 
are: boolean, int, smallint, tinyint, bigint, float, double, string, binary, 
char, and varchar.  
 
-**Recommendation:** Before running ```hawq register```, create a copy of the 
parquet file to be registered, then run ```hawq register``` on the copy. This 
leaves the original file available for additional Hive queries or if a data 
mapping error is encountered.
+The following HIVE data types cannot be converted to HAWQ equivalents: 
timestamp, decimal, array, struct, map, and union.   
 
-##Data Type Mapping<a id="topic1__section7"></a>
+###Data Type Mapping<a id="topic1__section7"></a>
 
-HAWQ and parquet tables and HIVE and HAWQ tables use different data types. 
Mapping must be used for compatibility. You are responsible for making sure 
your implementation is mapped to the appropriate data type before running `hawq 
register`. The tables below show equivalent data types, if available.
+HAWQ and Parquet tables and HIVE and HAWQ tables use different data types. 
Mapping must be used for compatibility. You are responsible for making sure 
your implementation is mapped to the appropriate data type before running `hawq 
register`. The tables below show equivalent data types, if available.
 
 <span class="tablecap">Table 1. HAWQ to Parquet Mapping</span>
 
@@ -187,5 +169,84 @@ group {
 | varchar  | varchar  |
 
 
+## <a id="topic1__section5"></a>Options
+
+**General Options**
+
+<dt>-? (show help) </dt>  
+<dd>Show help, then exit.
+
+<dt>-\\\-version  </dt> 
+<dd>Show the version of this utility, then exit.</dd>
+
+
+**Connection Options**
+
+<dt>-h , -\\\-host \<hostname\> </dt>
+<dd>Specifies the host name of the machine on which the HAWQ master database 
server is running. If not specified, reads from the environment variable 
`$PGHOST` or defaults to `localhost`.</dd>
+
+<dt> -p , -\\\-port \<port\> </dt> 
+<dd>Specifies the TCP port on which the HAWQ master database server is 
listening for connections. If not specified, reads from the environment 
variable `$PGPORT` or defaults to 5432.</dd>
+
+<dt>-U , -\\\-user \<username\> </dt> 
+<dd>The database role name to connect as. If not specified, reads from the 
environment variable `$PGUSER` or defaults to the current system user name.</dd>
+
+<dt>-d  , -\\\-database \<databasename\>  </dt>
+<dd>The database to register the Parquet HDFS data into. The default is 
`postgres`<dd>
+
+<dt>-f , -\\\-filepath \<hdfspath\></dt>
+<dd>The path of the file or directory in HDFS containing the files to be 
registered.</dd>
+ 
+<dt>\<tablename\> </dt>
+<dd>The HAWQ table that will store the data to be registered. If the --config 
option is not supplied, the table cannot use hash distribution. Random table 
distribution is strongly preferred. If hash distribution must be used, make 
sure that the distribution policy for the data files described in the .yml file 
is consistent with the table being registered into.</dd>
+
+####Miscellaneous Options
+
+The following options are used with specific use models.
+
+<dt>-e , -\\\-eof \<eof\></dt>
+<dd>Specify the end of the file to be registered. \<eof\> represents the valid 
content length of the file, in bytes to be used, a value between 0 the actual 
size of the file. If this option is not included, the actual file size, or size 
of files within a folder, is used. Used with Use Model 1.</dd>
+
+<dt>-F , -\\\-force</dt>
+<dd>Used for disaster recovery of a cluster. Clears all HDFS-related catalog 
contents in `pg_aoseg.pg_paqseg_$relid `and re-registers files to a specified 
table. The HDFS files are not removed or modified. To use this option for 
recovery, data is assumed to be periodically imported to the cluster to be 
recovered. Used with Use Model 2.</dd>
+
+<dt>-c , -\\\-config \<yml_config\> </dt> 
+<dd>Registers files specified by YAML-format configuration files into HAWQ. 
Used with Use Model 2.</dd>
+
+
+## <a id="topic1__section6"></a>Example: Usage Model 2
+
+This example shows how to register files using a .yml configuration file. This 
file is usually generated by the `hawq extract` command. 
+
+In SQL, create a table and insert data into the table:
+
+```
+create table paq1(a int, b varchar(10))with(appendonly=true, 
orientation=parquet);`
+insert into paq1 values(generate_series(1,1000), 'abcde');
+```
+
+In HAWQ, extract the table's metadata.
+
+```
+hawq extract -o paq1.yml paq1
+```
+
+In HAWQ, use the .yml file to register the new table paq2:
+
+```
+hawq register --config paq1.yml paq2
+```
+
+In SQL, select the new table to determine if the content has already been 
registered:
+
+```
+select count(*) from paq2;
+```
+The result should return 1000.
+
+## See Also
+
+[hawq extract](hawqextract.html#topic1)
+

[1/6] incubator-hawq-docs git commit: Updates for hawq register

Reply via email to