[ 
https://issues.apache.org/jira/browse/HAWQ-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890537#comment-15890537
 ] 

Kyle R Dunn commented on HAWQ-760:
----------------------------------

[~lilima] - How does hawq register change or handle trying to register files 
from different versions where the catalog could have changed? i.e. what would 
happen if I try to register tables from hawq 1.x into hawq 2.x?

> Hawq register
> -------------
>
>                 Key: HAWQ-760
>                 URL: https://issues.apache.org/jira/browse/HAWQ-760
>             Project: Apache HAWQ
>          Issue Type: New Feature
>          Components: Command Line Tools
>            Reporter: Yangcheng Luo
>            Assignee: Lili Ma
>             Fix For: backlog
>
>
> Scenario: 
> 1. Register a parquet file generated by other systems, such as Hive, Spark, 
> etc.
> 2. For cluster Disaster Recovery. Two clusters co-exist, periodically import 
> data from Cluster A to Cluster B. Need Register data to Cluster B.
> 3. For the rollback of table. Do checkpoints somewhere, and need to rollback 
> to previous checkpoint. 
> Usage1
> Description
> Register a file/folder to an existing table. Can register a file or a folder. 
> If we register a file, can specify eof of this file. If eof not specified, 
> directly use actual file size. If we register a folder, directly use actual 
> file size.
> hawq register [-h hostname] [-p port] [-U username] [-d databasename] [-f 
> filepath] [-e eof]<tablename>
> Usage 2
> Description
> Register according to .yml configuration file. 
> hawq register [-h hostname] [-p port] [-U username] [-d databasename] [-c 
> config] [--force][--repair] <tablename> 
> Behavior:
> 1. If table doesn't exist, will automatically create the table and register 
> the files in .yml configuration file. Will use the filesize specified in .yml 
> to update the catalog table. 
> 2. If table already exist, and neither --force nor --repair configured. Do 
> not create any table, and directly register the files specified in .yml file 
> to the table. Note that if the file is under table directory in HDFS, will 
> throw error, say, to-be-registered files should not under the table path.
> 3. If table already exist, and --force is specified. Will clear all the 
> catalog contents in pg_aoseg.pg_paqseg_$relid while keep the files on HDFS, 
> and then re-register all the files to the table.  This is for scenario 2.
> 4. If table already exist, and --repair is specified. Will change both file 
> folder and catalog table pg_aoseg.pg_paqseg_$relid to the state which .yml 
> file configures. Note may some new generated files since the checkpoint may 
> be deleted here. Also note the all the files in .yml file should all under 
> the table folder on HDFS. Limitation: Do not support cases for hash table 
> redistribution, table truncate and table drop. This is for scenario 3.
> Requirements for both the cases:
> 1. To be registered file path has to colocate with HAWQ in the same HDFS 
> cluster.
> 2. If to be registered is a hash table, the registered file number should be 
> one or multiple times or hash table bucket number.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to