[ 
https://issues.apache.org/jira/browse/HAWQ-1034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15624133#comment-15624133
 ] 

Lili Ma edited comment on HAWQ-1034 at 11/1/16 2:44 AM:
--------------------------------------------------------

Repair mode can be thought of particular case of force mode.  
1) Force mode registers the files according to yaml configuration file, erase 
all the records in catalog (pg_aoseg.pg_aoseg(paqseg)_$relid) and re-implement 
catalog insert. It requires HDFS files for the table be included in yaml 
configuation file.
2) Repair mode also registers files according to yaml configuration file, erase 
the catalog records and re-insert. But it doesn't require all the HDFS files 
for the table be included in yaml configuration file. It will directly delete 
those files which are under the table directory but not included in yaml 
configuration file. 
Since repair mode may directly deleting HDFS files, say, if user uses repair 
mode by mistake, his/her data may be deleted, it may bring some risks.  We can 
allow them to use force mode, and throw error for files under the directory but 
not included in yaml configuration file.  If user does think the files are 
unnecessary, he/she can delete the files by himself/herself.

The workaround for supporting repair mode use --force option:
1) If there is no added files since last checkpoint where the yaml 
configuration file is generated, force mode can directly handle it.
2) If there are some added files since last checkpoint which the user does want 
to delete, we can output those file information in force mode so that users can 
delete those files by themselves and then do register force mode again. 

Since we can use force mode to implement repair feature, we will remove 
existing code for repair mode and close this JIRA.  Thanks


was (Author: lilima):
Repair mode can be thought of particular case of force mode.  
1) Force mode registers the files according to yaml configuration file, erase 
all the records in catalog (pg_aoseg.pg_aoseg(paqseg)_$relid) and re-implement 
catalog insert. It requires HDFS files for the table be included in yaml 
configuation file.
2) Repair mode also registers files according to yaml configuration file, erase 
the catalog records and re-insert. But it doesn't require all the HDFS files 
for the table be included in yaml configuration file. It will directly delete 
those files which are under the table directory but not included in yaml 
configuration file. 
I'm a little concerned about directly deleting HDFS files, say, if user uses 
repair mode by mistake, his/her data may be deleted.  So, what if we just allow 
them to use force mode, and throw error for files under the directory but not 
included in yaml configuration file.  If user does think the files are 
unnecessary, he/she can delete the files by himself/herself.

The workaround for supporting repair mode use --force option:
1) If there is no added files since last checkpoint where the yaml 
configuration file is generated, force mode can directly handle it.
2) If there are some added files since last checkpoint which the user does want 
to delete, we can output those file information in force mode so that users can 
delete those files by themselves and then do register force mode again. 

Since we can use force mode to implement repair feature, we will remove 
existing code for repair mode and close this JIRA.  Thanks

> add --repair option for hawq register
> -------------------------------------
>
>                 Key: HAWQ-1034
>                 URL: https://issues.apache.org/jira/browse/HAWQ-1034
>             Project: Apache HAWQ
>          Issue Type: Sub-task
>          Components: Command Line Tools
>    Affects Versions: 2.0.1.0-incubating
>            Reporter: Lili Ma
>            Assignee: Chunling Wang
>             Fix For: 2.0.1.0-incubating
>
>
> add --repair option for hawq register
> Will change both file folder and catalog table pg_aoseg.pg_paqseg_$relid to 
> the state which .yml file configures. Note may some new generated files since 
> the checkpoint may be deleted here. Also note the all the files in .yml file 
> should all under the table folder on HDFS. Limitation: Do not support cases 
> for hash table redistribution, table truncate and table drop. This is for 
> scenario rollback of table: Do checkpoints somewhere, and need to rollback to 
> previous checkpoint. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to