Github user dyozie commented on a diff in the pull request:
https://github.com/apache/incubator-hawq-docs/pull/17#discussion_r81397353
--- Diff: reference/cli/admin_utilities/hawqregister.html.md.erb ---
@@ -2,102 +2,83 @@
title: hawq register
---
-Loads and registers external parquet-formatted data in HDFS into a
corresponding table in HAWQ.
+Loads and registers
+AO or Parquet-formatted data in HDFS into a corresponding table in HAWQ.
## <a id="topic1__section2"></a>Synopsis
``` pre
-hawq register <databasename> <tablename> <hdfspath>
+Usage 1:
+hawq register [<connection_options>] [-f <hdfsfilepath>] [-e <Eof>]
<tablename>
+
+Usage 2:
+hawq register [<connection_options>] [-c <configfilepath>][--force]
<tablename>
+
+Connection Options:
[-h <hostname>]
[-p <port>]
[-U <username>]
[-d <database>]
- [-t <tablename>]
+
+Misc. Options:
[-f <filepath>]
+ [-e <eof>]
+ [--force]
[-c <yml_config>]
hawq register help | -?
hawq register --version
```
## <a id="topic1__section3"></a>Prerequisites
-The client machine where `hawq register` is executed must have the
following:
+The client machine where `hawq register` is executed must meet the
following conditions:
- Network access to and from all hosts in your HAWQ cluster (master and
segments) and the hosts where the data to be loaded is located.
+- The Hadoop client must be configured and the hdfs filepath specified.
- The files to be registered and the HAWQ table located in the same HDFS
cluster.
- The target table DDL is configured with the correct data type mapping.
## <a id="topic1__section4"></a>Description
-`hawq register` is a utility that loads and registers existing or external
parquet data in HDFS into HAWQ, so that it can be directly ingested and
accessed through HAWQ. Parquet data from the file or directory in the specified
path is loaded into the appropriate HAWQ table directory in HDFS and the
utility updates the corresponding HAWQ metadata for the files.
+`hawq register` is a utility that loads and registers existing data files
or folders in HDFS into HAWQ internal tables, allowing HAWQ to directly read
the data and use internal table processing for operations such as transactions
and high perforance, without needing to load or copy it. Data from the file or
directory specified by \<hdfsfilepath\> is loaded into the appropriate HAWQ
table directory in HDFS and the utility updates the corresponding HAWQ metadata
for the files.
-Only parquet tables can be loaded using the `hawq register` command.
Metadata for the parquet file(s) and the destination table must be consistent.
Different data types are used by HAWQ tables and parquet tables, so the data
is mapped. You must verify that the structure of the parquet files and the HAWQ
table are compatible before running `hawq register`.
+You can use `hawq register` to:
-Note: only HAWQ or HIVE-generated parquet tables are currently supported.
+- Load and register external Parquet-formatted file data generated by an
external system such as Hive or Spark.
+- Recover cluster data from a backup cluster.
-###Limitations for Registering Hive Tables to HAWQ
-The currently-supported data types for generating Hive tables into HAWQ
tables are: boolean, int, smallint, tinyint, bigint, float, double, string,
binary, char, and varchar.
+Two usage models are available.
-The following HIVE data types cannot be converted to HAWQ equivalents:
timestamp, decimal, array, struct, map, and union.
+###Usage Model 1: register file data to an existing table.
+`hawq register [-h hostname] [-p port] [-U username] [-d databasename] [-f
filepath] [-e eof]<tablename>`
-## <a id="topic1__section5"></a>Options
-
-**General Options**
-
-<dt>-? (show help) </dt>
-<dd>Show help, then exit.
-
-<dt>-\\\-version </dt>
-<dd>Show the version of this utility, then exit.</dd>
-
-
-**Connection Options**
-
-<dt>-h \<hostname\> </dt>
-<dd>Specifies the host name of the machine on which the HAWQ master
database server is running. If not specified, reads from the environment
variable `$PGHOST` or defaults to `localhost`.</dd>
-
-<dt> -p \<port\> </dt>
-<dd>Specifies the TCP port on which the HAWQ master database server is
listening for connections. If not specified, reads from the environment
variable `$PGPORT` or defaults to 5432.</dd>
+Metadata for the Parquet file(s) and the destination table must be
consistent. Different data types are used by HAWQ tables and Parquet files, so
the data is mapped. Refer to the section [Data Type
Mapping](hawqregister.html#topic1__section7) below. You must verify that the
structure of the Parquet files and the HAWQ table are compatible before running
`hawq register`.
-<dt>-U \<username\> </dt>
-<dd>The database role name to connect as. If not specified, reads from the
environment variable `$PGUSER` or defaults to the current system user name.</dd>
+####Limitations
+Only HAWQ or Hive-generated Parquet tables are supported.
+Hash tables and artitioned tables are not supported in this use model.
-<dt>-d , --database \<databasename\> </dt>
-<dd>The database to register the parquet HDFS data into. The default is
`postgres`<dd>
+###Usage Model 2: Use information from a YAML configuration file to
register data
-<dt>-t , --tablename \<tablename\> </dt>
-<dd>The HAWQ table that will store the parquet data. The table cannot use
hash distribution: only tables using random distribution can be registered into
HAWQ.</dd>
-
-<dt>-f , --filepath \<hdfspath\></dt>
-<dd>The path of the file or directory in HDFS containing the files to be
registered.</dd>
-
-<dt>-c , --config \<yml_config\> </dt>
-<dd>Registers a YAML-format configuration file into HAWQ.</dd>
-
-
+`hawq register [-h hostname] [-p port] [-U username] [-d databasename] [-c
configfile] [--force] <tablename>`
-## <a id="topic1__section6"></a>Examples
+Files generated by the `hawq extract` command are registered through use
of metadata in a YAML configuration file. Both AO and Parquet tables can be
registered. Tables need not exist in HAWQ before being registered.
-This example shows how to register a HIVE-generated parquet file in HDFS
into the table `parquet_table` in HAWQ, which is in the database named
`postgres`. The file path of the HIVE-generated file is
`hdfs://localhost:8020/temp/hive.paq`.
-
-For the purposes of this example, assume that the location of the database
is `hdfs://localhost:8020/hawq_default`, the tablespace id is 16385, the
database id is 16387, the table filenode id is 77160, and the last file under
the filenode is numbered 7.
-
-Enter:
-
-``` pre
-$ hawq register postgres parquet_table hdfs://localhost:8020/temp/hive.paq
-```
+The register process behaves differently, according to different
conditions.
-After running the `hawq register` command for the file location
`hdfs://localhost:8020/temp/hive.paq`, the corresponding new location of the
file in HDFS is: `hdfs://localhost:8020/hawq_default/16385/16387/77160/8`. The
command then updates the metadata of the table `parquet_table` in HAWQ, which
is contained in the table `pg_aoseg.pg_paqseg_77160`. The pg\_aoseg is a fixed
schema for row-oriented and parquet ao tables. For row-oriented tables, table
name prefix is pg\_aoseg. The table name prefix for parquet tables is
pg\_paqseg. 77160 is the relation id of the table.
+- Existing tables have files appended to the existing HAWQ table.
+- If a table does not exist, it is created and registered into HAWQ.
+- If the -\-force option is used, the data in existing catalog tables is
erased and re-registered.
-To locate the table, you can either find the relation ID by looking up the
catalog table pg\_class by running `select oid from pg_class where
relname=$relname` or by finding the table name by using the command `select
segrelid from pg_appendonly where relid = $relid` then running `select relname
from pg_class where oid = segrelid`.
+###Limitations for Registering Hive Tables to HAWQ
+The currently-supported data types for generating Hive tables into HAWQ
tables are: boolean, int, smallint, tinyint, bigint, float, double, string,
binary, char, and varchar.
-**Recommendation:** Before running ```hawq register```, create a copy of
the parquet file to be registered, then run ```hawq register``` on the copy.
This leaves the original file available for additional Hive queries or if a
data mapping error is encountered.
+The following HIVE data types cannot be converted to HAWQ equivalents:
timestamp, decimal, array, struct, map, and union.
-##Data Type Mapping<a id="topic1__section7"></a>
+###Data Type Mapping<a id="topic1__section7"></a>
-HAWQ and parquet tables and HIVE and HAWQ tables use different data types.
Mapping must be used for compatibility. You are responsible for making sure
your implementation is mapped to the appropriate data type before running `hawq
register`. The tables below show equivalent data types, if available.
--- End diff --
See previous edit.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---