[4/7] incubator-hawq-docs git commit: Edits from David Y.

yozie Fri, 30 Sep 2016 17:27:07 -0700

Edits from David Y.


Project: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/repo
Commit: 
http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/commit/8e116c57
Tree: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/tree/8e116c57
Diff: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/diff/8e116c57

Branch: refs/heads/master
Commit: 8e116c576eb75faea0de55b8d781418cac515c80
Parents: aa65a9c
Author: Jane Beckman <[email protected]>
Authored: Fri Sep 30 15:53:07 2016 -0700
Committer: Jane Beckman <[email protected]>
Committed: Fri Sep 30 15:53:07 2016 -0700

----------------------------------------------------------------------
 datamgmt/load/g-register_files.html.md.erb      | 39 +++++++++-----------
 .../admin_utilities/hawqregister.html.md.erb    | 31 ++++++++--------
 2 files changed, 32 insertions(+), 38 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/8e116c57/datamgmt/load/g-register_files.html.md.erb
----------------------------------------------------------------------
diff --git a/datamgmt/load/g-register_files.html.md.erb 
b/datamgmt/load/g-register_files.html.md.erb
index 94d0140..f04c3bb 100644
--- a/datamgmt/load/g-register_files.html.md.erb
+++ b/datamgmt/load/g-register_files.html.md.erb
@@ -1,20 +1,20 @@
 ---
-title: Registering Files into HAWQ Internal Tables
+title: Registering Files into HAWQ Internal Tables<a id="topic1__section1"></a>
 ---
 
 The `hawq register` utility loads and registers HDFS data files or folders 
into HAWQ internal tables. Files can be read directly, rather than having to be 
copied or loaded, resulting in higher performance and more efficient 
transaction processing.
 
-Data from the file or directory specified by \<hdfsfilepath\> is loaded into 
the appropriate HAWQ table directory in HDFS and the utility updates the 
corresponding HAWQ metadata for the files. Either AO for Parquet-formatted in 
HDFS can be loaded into a corresponding table in HAWQ.
+Data from the file or directory specified by \<hdfsfilepath\> is loaded into 
the appropriate HAWQ table directory in HDFS and the utility updates the 
corresponding HAWQ metadata for the files. Either AO or Parquet-formatted 
tables in HDFS can be loaded into a corresponding table in HAWQ.
 
 You can use `hawq register` either to:
 
 -  Load and register external Parquet-formatted file data generated by an 
external system such as Hive or Spark.
 -  Recover cluster data from a backup cluster for disaster recovery. 
 
-Requirements for running `hawq register` on the client server are:
+Requirements for running `hawq register` on the  server are:
 
--   Network access to and from all hosts in your HAWQ cluster (master and 
segments) and the hosts where the data to be loaded is located.
--   The Hadoop client configured and the hdfs filepath specified.
+-   All hosts in your HAWQ cluster (master and segments) must have network 
access between them and the hosts containing the data to be loaded.
+-   The Hadoop client must be configured and the hdfs filepath specified.
 -   The files to be registered and the HAWQ table must be located in the same 
HDFS cluster.
 -   The target table DDL is configured with the correct data type mapping.
 
@@ -26,7 +26,7 @@ Only HAWQ or Hive-generated Parquet tables are supported. 
Partitioned tables are
 
 Metadata for the Parquet file(s) and the destination table must be consistent. 
Different  data types are used by HAWQ tables and Parquet files, so data must 
be mapped. You must verify that the structure of the parquet files and the HAWQ 
table are compatible before running `hawq register`. 
 
-We recommand creating a copy of the Parquet file to be registered before 
running ```hawq register```
+As a best practice, create a copy of the Parquet file to be registered before 
running ```hawq register```
 You can then then run ```hawq register``` on the copy,  leaving the original 
file available for additional Hive queries or if a data mapping error is 
encountered.
 
 ###Limitations for Registering Hive Tables to HAWQ
@@ -48,7 +48,7 @@ $ hawq register -d postgres -f 
hdfs://localhost:8020/temp/hive.paq parquet_table
 
 After running the `hawq register` command for the file location  
`hdfs://localhost:8020/temp/hive.paq`, the corresponding new location of the 
file in HDFS is:  `hdfs://localhost:8020/hawq_default/16385/16387/77160/8`. 
 
-The command then updates the metadata of the table `parquet_table` in HAWQ, 
which is contained in the table `pg_aoseg.pg_paqseg_77160`. The pg\_aoseg is a 
fixed schema for row-oriented and Parquet AO tables. For row-oriented tables, 
the table name prefix is pg\_aoseg. The table name prefix for parquet tables is 
pg\_paqseg. 77160 is the relation id of the table.
+The command then updates the metadata of the table `parquet_table` in HAWQ, 
which is contained in the table `pg_aoseg.pg_paqseg_77160`. The pg\_aoseg table 
is a fixed schema for row-oriented and Parquet AO tables. For row-oriented 
tables, the table name prefix is pg\_aoseg. The table name prefix for parquet 
tables is pg\_paqseg. 77160 is the relation id of the table.
 
 To locate the table, either find the relation ID by looking up the catalog 
table pg\_class in SQL by running 
 
@@ -66,7 +66,7 @@ select relname from pg_class where oid = segrelid
 
 ##Registering Data Using Information from a YAML Configuration File<a 
id="topic1__section3"></a>
  
-The `hawq register` command can register HDFS files  by using metadata loaded 
from a YAML configuration file by using the `--config <yaml_config\>` option. 
Both AO and Parquet tables can be registered. Tables need not exist in HAWQ 
before being registered. This function can be useful in disaster recovery, 
allowing information created by the `hawq extract` command to be used to 
re-create HAWQ tables.
+The `hawq register` command can register HDFS files  by using metadata loaded 
from a YAML configuration file by using the `--config <yaml_config\>` option. 
Both AO and Parquet tables can be registered. Tables need not exist in HAWQ 
before being registered. This function can be useful in disaster recovery, 
allowing information created by the `hawq extract` command to re-create HAWQ 
tables.
 
 You can also use a YAML confguration file to append HDFS files to an existing 
HAWQ table or create a table and register it into HAWQ.
 
@@ -76,7 +76,7 @@ Data is registered according to the following conditions:
 
 -  Existing tables have files appended to the existing HAWQ table.
 -  If a table does not exist, it is created and registered into HAWQ. The 
catalog table will be updated with the file size specified by the YAML file.
--  If the --force option is used, the data in existing catalog tables is 
erased and re-registered. All HDFS-related catalog contents in 
`pg_aoseg.pg_paqseg_$relid ` are cleared. The original files on HDFS are 
retained.
+-  If the -\\\-force option is used, the data in existing catalog tables is 
erased and re-registered. All HDFS-related catalog contents in 
`pg_aoseg.pg_paqseg_$relid ` are cleared. The original files on HDFS are 
retained.
 
 Tables using random distribution are preferred for registering into HAWQ. If 
hash tables are to be  registered, the distribution policy in the YAML file 
must match that of the table being registered into. 
 
@@ -85,41 +85,36 @@ In registering hash tables, the size of the registered file 
should be identical
 
 ###Example: Registration using a YAML Configuration File
 
-This example shows how to use hawq register to register HDFS data using a YAML 
configuration file generated by hawq extract. 
+This example shows how to use `hawq register` to register HDFS data using a 
YAML configuration file generated by hawq extract. 
 
 First, create a table in SQL and insert some data into it.  
 
 ```
-create table paq1(a int, b varchar(10))with(appendonly=true, 
orientation=parquet);
+=> create table paq1(a int, b varchar(10))with(appendonly=true, 
orientation=parquet);
+=> insert into paq1 values(generate_series(1,1000), 'abcde');
 ```
 
-In SQL, run:
-
-```
-insert into paq1 values(generate_series(1,1000), 'abcde');
-```
-
-Go into the hawq administration utilities, and extract the table metadata by 
using the `hawq extract` utility.
+Extract the table metadata by using the `hawq extract` utility.
 
 ```
 hawq extract -o paq1.yml paq1
 ```
 
-Register the data into new table paq2, using the --config option to identify 
the YAML file.
+Register the data into new table paq2, using the -\\\-config option to 
identify the YAML file.
 
 ```
 hawq register --config paq1.yml paq2
 ```
-In SQL, select the new table and check to verify that  the content has been 
registered.
+Select the new table and check to verify that  the content has been registered.
 
 ```
-select count(*) from paq2;
+=> select count(*) from paq2;
 ```
 
 
 ##Data Type Mapping<a id="topic1__section4"></a>
 
-HAWQ and parquet tables and HIVE and HAWQ tables use different data types. 
Mapping must be used for metadata compatibility. You are responsible for making 
sure your implementation is mapped to the appropriate data type before running 
`hawq register`. The tables below show equivalent data types, if available.
+HIVE and Parquet tables use different data types than HAWQ tables. Mapping 
must be used for metadata compatibility. You are responsible for making sure 
your implementation is mapped to the appropriate data type before running `hawq 
register`. The tables below show equivalent data types, if available.
 
 <span class="tablecap">Table 1. HAWQ to Parquet Mapping</span>
 

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/8e116c57/reference/cli/admin_utilities/hawqregister.html.md.erb
----------------------------------------------------------------------
diff --git a/reference/cli/admin_utilities/hawqregister.html.md.erb 
b/reference/cli/admin_utilities/hawqregister.html.md.erb
index de64a11..02b65fd 100644
--- a/reference/cli/admin_utilities/hawqregister.html.md.erb
+++ b/reference/cli/admin_utilities/hawqregister.html.md.erb
@@ -1,9 +1,8 @@
 ---
-title: hawq register
+title: hawq register<a id="topic1__section1"></a>
 ---
 
-Loads and registers 
-AO or Parquet-formatted data in HDFS into a corresponding table in HAWQ.
+Loads and registers AO or Parquet-formatted tables in HDFS into a 
corresponding table in HAWQ.
 
 ## <a id="topic1__section2"></a>Synopsis
 
@@ -33,9 +32,9 @@ hawq register --version
 
 The client machine where `hawq register` is executed must meet the following 
conditions:
 
--   Network access to and from all hosts in your HAWQ cluster (master and 
segments) and the hosts where the data to be loaded is located.
+-   All hosts in your HAWQ cluster (master and segments) must have network 
access between them and the hosts containing the data to be loaded.
 -   The Hadoop client must be configured and the hdfs filepath specified.
--   The files to be registered and the HAWQ table located in the same HDFS 
cluster.
+-   The files to be registered and the HAWQ table must be located in the same 
HDFS cluster.
 -   The target table DDL is configured with the correct data type mapping.
 
 ## <a id="topic1__section4"></a>Description
@@ -49,7 +48,7 @@ You can use `hawq register` to:
 
 Two usage models are available.
 
-###Usage Model 1: register file data to an existing table.
+###Usage Model 1: Register file data to an existing table.
 
 `hawq register [-h hostname] [-p port] [-U username] [-d databasename] [-f 
filepath] [-e eof]<tablename>`
 
@@ -69,7 +68,7 @@ The register process behaves differently, according to 
different conditions.
 
 -  Existing tables have files appended to the existing HAWQ table.
 -  If a table does not exist, it is created and registered into HAWQ. 
--  If the -\-force option is used, the data in existing catalog tables is 
erased and re-registered.
+-  If the -\\\-force option is used, the data in existing catalog tables is 
erased and re-registered.
 
 ###Limitations for Registering Hive Tables to HAWQ
 The currently-supported data types for generating Hive tables into HAWQ tables 
are: boolean, int, smallint, tinyint, bigint, float, double, string, binary, 
char, and varchar.  
@@ -207,39 +206,39 @@ The following options are used with specific use models.
 <dd>Specify the end of the file to be registered. \<eof\> represents the valid 
content length of the file, in bytes to be used, a value between 0 the actual 
size of the file. If this option is not included, the actual file size, or size 
of files within a folder, is used. Used with Use Model 1.</dd>
 
 <dt>-F , -\\\-force</dt>
-<dd>Used for disaster recovery of a cluster. Clears all HDFS-related catalog 
contents in `pg_aoseg.pg_paqseg_$relid `and re-registers files to a specified 
table. The HDFS files are not removed or modified. To use this option for 
recovery, data is assumed to be periodically imported to the cluster to be 
recovered. Used with Use Model 2.</dd>
+<dd>Used for disaster recovery of a cluster. Clears all HDFS-related catalog 
contents in `pg_aoseg.pg_paqseg_$relid `and re-registers files to a specified 
table. The HDFS files are not removed or modified. To use this option for 
recovery, data is assumed to be periodically imported to the cluster to be 
recovered. Used with Usage Model 2.</dd>
 
 <dt>-c , -\\\-config \<yml_config\> </dt> 
-<dd>Registers files specified by YAML-format configuration files into HAWQ. 
Used with Use Model 2.</dd>
+<dd>Registers files specified by YAML-format configuration files into HAWQ. 
Used with Usage Model 2.</dd>
 
 
 ## <a id="topic1__section6"></a>Example: Usage Model 2
 
 This example shows how to register files using a YAML configuration file. This 
file is usually generated by the `hawq extract` command. 
 
-In SQL, create a table and insert data into the table:
+Create a table and insert data into the table:
 
 ```
-create table paq1(a int, b varchar(10))with(appendonly=true, 
orientation=parquet);`
-insert into paq1 values(generate_series(1,1000), 'abcde');
+=> create table paq1(a int, b varchar(10))with(appendonly=true, 
orientation=parquet);`
+=> insert into paq1 values(generate_series(1,1000), 'abcde');
 ```
 
-In HAWQ, extract the table's metadata.
+Extract the table's metadata.
 
 ```
 hawq extract -o paq1.yml paq1
 ```
 
-In HAWQ, use the YAML file to register the new table paq2:
+Use the YAML file to register the new table paq2:
 
 ```
 hawq register --config paq1.yml paq2
 ```
 
-In SQL, select the new table to determine if the content has already been 
registered:
+Select the new table to determine if the content has already been registered:
 
 ```
-select count(*) from paq2;
+=> select count(*) from paq2;
 ```
 The result should return 1000.

[4/7] incubator-hawq-docs git commit: Edits from David Y.

Reply via email to