[ 
https://issues.apache.org/jira/browse/HAWQ-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15891592#comment-15891592
 ] 

Lili Ma commented on HAWQ-760:
------------------------------

[~kdunn926] 

HAWQ register doesn't check HAWQ version number.  Although HAWQ 2.X optimized 
the storage for AO format table, it can still read the AO file generated by 
HAWQ 1.X. Parquet file does not changed, so there won't be problem. 
So, I don't think you will encounter problem if you want to register table from 
HAWQ 1.X to HAWQ 2.X.

If you want to register Parquet files generated by other products such as Hive, 
Impala which may use a later version, hawq register don't throw error when 
register.  But you may meet some error thrown out when select from the 
registered table.  For example, if some data page is encoded with dictionary 
encoding, HAWQ will throw error out indicating that it can not process that. 

> Hawq register
> -------------
>
>                 Key: HAWQ-760
>                 URL: https://issues.apache.org/jira/browse/HAWQ-760
>             Project: Apache HAWQ
>          Issue Type: New Feature
>          Components: Command Line Tools
>            Reporter: Yangcheng Luo
>            Assignee: Lili Ma
>             Fix For: backlog
>
>
> Scenario: 
> 1. Register a parquet file generated by other systems, such as Hive, Spark, 
> etc.
> 2. For cluster Disaster Recovery. Two clusters co-exist, periodically import 
> data from Cluster A to Cluster B. Need Register data to Cluster B.
> 3. For the rollback of table. Do checkpoints somewhere, and need to rollback 
> to previous checkpoint. 
> Usage1
> Description
> Register a file/folder to an existing table. Can register a file or a folder. 
> If we register a file, can specify eof of this file. If eof not specified, 
> directly use actual file size. If we register a folder, directly use actual 
> file size.
> hawq register [-h hostname] [-p port] [-U username] [-d databasename] [-f 
> filepath] [-e eof]<tablename>
> Usage 2
> Description
> Register according to .yml configuration file. 
> hawq register [-h hostname] [-p port] [-U username] [-d databasename] [-c 
> config] [--force][--repair] <tablename> 
> Behavior:
> 1. If table doesn't exist, will automatically create the table and register 
> the files in .yml configuration file. Will use the filesize specified in .yml 
> to update the catalog table. 
> 2. If table already exist, and neither --force nor --repair configured. Do 
> not create any table, and directly register the files specified in .yml file 
> to the table. Note that if the file is under table directory in HDFS, will 
> throw error, say, to-be-registered files should not under the table path.
> 3. If table already exist, and --force is specified. Will clear all the 
> catalog contents in pg_aoseg.pg_paqseg_$relid while keep the files on HDFS, 
> and then re-register all the files to the table.  This is for scenario 2.
> 4. If table already exist, and --repair is specified. Will change both file 
> folder and catalog table pg_aoseg.pg_paqseg_$relid to the state which .yml 
> file configures. Note may some new generated files since the checkpoint may 
> be deleted here. Also note the all the files in .yml file should all under 
> the table folder on HDFS. Limitation: Do not support cases for hash table 
> redistribution, table truncate and table drop. This is for scenario 3.
> Requirements for both the cases:
> 1. To be registered file path has to colocate with HAWQ in the same HDFS 
> cluster.
> 2. If to be registered is a hash table, the registered file number should be 
> one or multiple times or hash table bucket number.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to