Repository: incubator-hawq Updated Branches: refs/heads/master 1aa0fbf5a -> b661d3a0e
HAWQ-1029. Update hawqregister_help info for 2.0.1.0 hawq release. Project: http://git-wip-us.apache.org/repos/asf/incubator-hawq/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-hawq/commit/b661d3a0 Tree: http://git-wip-us.apache.org/repos/asf/incubator-hawq/tree/b661d3a0 Diff: http://git-wip-us.apache.org/repos/asf/incubator-hawq/diff/b661d3a0 Branch: refs/heads/master Commit: b661d3a0ed40aec90f3c23f9badd8898fbec53f5 Parents: 1aa0fbf Author: xunzhang <[email protected]> Authored: Thu Sep 1 21:08:34 2016 +0800 Committer: Lili Ma <[email protected]> Committed: Fri Sep 23 20:08:13 2016 +0800 ---------------------------------------------------------------------- tools/doc/hawqregister_help | 67 +++++++++++++++++++++++++--------------- 1 file changed, 42 insertions(+), 25 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-hawq/blob/b661d3a0/tools/doc/hawqregister_help ---------------------------------------------------------------------- diff --git a/tools/doc/hawqregister_help b/tools/doc/hawqregister_help index a664127..b8577d7 100644 --- a/tools/doc/hawqregister_help +++ b/tools/doc/hawqregister_help @@ -1,14 +1,13 @@ COMMAND NAME: hawq register -Usage1: Register parquet files generated by other system into the corrsponding table in HAWQ -Usage2: Register parquet/ao table from laterst-sync-metadata in yaml format +Usage1: Register parquet files generated by other system into corrsponding table in HAWQ. +Usage2: Register parquet/ao table from yaml configuration file. ***************************************************** SYNOPSIS ***************************************************** - -Usage1: hawq register [-h hostname] [-p port] [-U username] [-d databasename] [-f filepath] <tablename> -Usage2: hawq register [-h hostname] [-p port] [-U username] [-d databasename] [-c config] <tablename> +Usage1: hawq register [-h hostname] [-p port] [-U username] [-d databasename] [-f filepath] [-e eof] <tablename> +Usage2: hawq register [-h hostname] [-p port] [-U username] [-d databasename] [-c config] [-F, --force] <tablename> hawq register help hawq register -? @@ -18,10 +17,9 @@ hawq register --version ***************************************************** DESCRIPTION ***************************************************** - Use Case1: -"hawq register" is a utility to register file(s) on HDFS into -the table in HAWQ. It moves the file in the path(if path +"hawq register" is an utility to register file(s) on HDFS into +the table in HAWQ. It moves file(s) in the path(if path refers to a file) or files under the path(if path refers to a directory) into the table directory corresponding to the table, and then update the table meta data to include the files. @@ -37,10 +35,23 @@ The file(s) to be registered and the table in HAWQ must be in the same HDFS cluster. Use Case2: -User should be able to use hawq register to register table files into a new HAWQ cluster. -It is some kind of protecting against corruption from users' perspective. -Users use the last-known-good metadata to update the portion of catalog managing HDFS blocks. -The table files or dictionary should be backuped(such as using distcp) into the same path in the new HDFS setting. +Hawq register can register both AO and parquet format table, and the files to be registered are listed in the .yml configuration file. +This configuration file can be generated by hawq extract. Register through .yml configuration doesnât require the table already exist, +since .yml file contains table schema already. +HAWQ register behaviors differently with different options: + * If the table does not exist, hawq register will create table and do register. + * If table already exist, hawq register will append the files to the existing table. + * If --force option specified, hawq register will erase existing catalog + table pg_aoseg.pg_aoseg_$relid/pg_aoseg.pg_paqseg_$relid data for the table and + re-register according to .yml configuration file definition. Note. If there are + files under table directory which are not specified in .yml configuration file, it will throw error out. +Note. Without --force specified, if some file specified in .yml configuration file lie under the table directory, + hawq register will throw error out. +Note. With --force option specified, if there are files under table directory which are not specified in .yml configuration file, + hawq register will throw error out. +Note. In usage2, if the table is hash distributed, hawq register just check the file number to be registered + has to be multiple times of this tableâs bucket number, and check whether the distribution key specified + in .yml configuration file is same as that of table. It does not check whether files are actually distributed by the key. To use "hawq register", HAWQ must have been started. Currently "hawq register" supports both AO and Parquet formats in this case. @@ -49,7 +60,6 @@ The partition table is not supported in this version, and we will support it soo ***************************************************** Arguments ***************************************************** - <tablename> Name of the table to be registered into. @@ -57,7 +67,6 @@ Name of the table to be registered into. ***************************************************** OPTIONS ***************************************************** - -? (help) Displays the online help. @@ -69,7 +78,6 @@ Displays the version of this utility. ***************************************************** CONNECTION OPTIONS ***************************************************** - -h hostname Specifies the host name of the machine on which the HAWQ master @@ -91,7 +99,6 @@ CONNECTION OPTIONS ***************************************************** EXAMPLE FOR USAGE1 ***************************************************** - Run "hawq register" to register a parquet file in HDFS with path 'hdfs://localhost:8020/temp/hive.paq' generated by hive into table 'parquet_table' in HAWQ, which is in the database named 'postgres'. @@ -100,7 +107,7 @@ Assume the location of the database is 'hdfs://localhost:8020/hawq_default', tablespace id is '16385', database id is '16387', table filenode id is '77160', last file under the filenode numbered '7'. -$ hawq register postgres parquet_table hdfs://localhost:8020/temp/hive.paq +$ hawq register -d postgres -f hdfs://localhost:8020/temp/hive.paq parquet_table This will move the file 'hdfs://localhost:8020/temp/hive.paq' into the corresponding new place 'hdfs://localhost:8020/hawq_default/16385/16387/77160/8' in HDFS, then @@ -110,14 +117,24 @@ table 'pg_aoseg.pg_paqseg_77160'. ***************************************************** EXAMPLE FOR USAGE2 ***************************************************** -$ psql -c "drop table if exists table;" -$ psql -c "create table table(i int) with (appendonly=true, orientation=parquet) distributed by (i);" -$ psql -c "insert into table values(1), (2), (3);" -$ hawq extract -d postgres -o t.yml table -$ hawq register -d postgres -c t.yml newtable -In this example, suppose that "table" is a table in old HAWQ Cluster, user dump "t.yml" yaml file to -save the metadata of "table". To register the "newtable" in a new HAWQ Cluster, user run "hawq register" -to register the newtable with the given yaml file "t.yml". +This example shows hawq register functionality of hawq register according to yml configuration file. +Usually the yml configuration file is generated by hawq extract. +This example shows the life cycle of hawq extract and hawq register. + +Firstly, create a table and insert some data into it: +$ psql -c "create table paq1(a int, b varchar(10))with(appendonly=true, orientation=parquet);" +$ psql -c "insert into paq1 values(generate_series(1,1000), 'abcde');" + +Secondly, extract the table metadata information out: +$ hawq extract -o paq1.yml paq1 + +Thirdly, register to new table paq2 identifying yml file: +$ hawq register --config paq1.yml paq2 + +Finally, select the new table to look at whether the content has already been registered. +$ select count(*) from paq2; + +In the above example, the final result should be return 1000. ***************************************************** DATA TYPES
