[ 
https://issues.apache.org/jira/browse/HBASE-15940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15315073#comment-15315073
 ] 

Stephen Yuan Jiang edited comment on HBASE-15940 at 6/4/16 12:02 AM:
---------------------------------------------------------------------

The existing flow is: offline region, move some of the files in the overlapped 
regions to a new region, then sidelined the old region directory.  So any files 
(eg. the old .regioninfo file) that are not moved will be sidelined.  I don't 
have to write any new code for this.  the unmoved reference file is 
automatically sidelined at the end of merge.

Yeah, I did see HBASE-15406 when I add the new admin.setCatalogJanitor(false) 
code.  If HBCK run is killed by Ctrl-C, the Catalog Janitor would be disabled 
until manual enable.  I think a Ctrl-C is not a common practice and not running 
Catalog Janitor is not a big deal (at least to me, how often do we split/merge 
regions and then run major compaction).  I feel that using zookeeper to keep 
track of catalog janitor enable/disable is a little bit overkill for this 
problem.  That is why I prefer just have a simple disable and enable in the 
finally-block. 


was (Author: syuanjiang):
The existing flow is: offline region, move some of the files in the overlapped 
regions to a new region, then sidelined the old region directory.  So any files 
(eg. the old .regioninfo file) that are not moved will be sidelined.  I don't 
have to write any new code for this.

Yeah, I did see HBASE-15406 when I add the new admin.setCatalogJanitor(false) 
code.  If HBCK run is killed by Ctrl-C, the Catalog Janitor would be disabled 
until manual enable.  I think a Ctrl-C is not a common practice and not running 
Catalog Janitor is not a big deal (at least to me, how often do we split/merge 
regions and then run major compaction).  I feel that using zookeeper to keep 
track of catalog janitor enable/disable is a little bit overkill for this 
problem.  That is why I prefer just have a simple disable and enable in the 
finally-block. 

> HBCK unnecessary moves reference files when a table has split region to fix 
> non-existing overlap regions
> --------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-15940
>                 URL: https://issues.apache.org/jira/browse/HBASE-15940
>             Project: HBase
>          Issue Type: Bug
>          Components: hbck
>    Affects Versions: 1.0.0
>            Reporter: Stephen Yuan Jiang
>            Assignee: Stephen Yuan Jiang
>         Attachments: org.apache.hadoop.hbase.util.TestHBaseFsck-output.txt, 
> repro-hbck-repair-healthy-splitted=region.patch, skipReferenceFiles.patch
>
>
> When repair option (the -fixHdfsOverlaps option specifically) is specified 
> against a table, if the table has splitted regions (both parent region and 
> child regions exists with reference files), Hbck would wrongly think that 
> there exists overlapped regions and try to merge them and fix it.  
> This is by-design, as current implementation of Hbck uses HDFS as the trusted 
> source without consulting META table.
> Here is the comments from one of unit tests:
> {code}
>       // TODO: fixHdfsHoles does not work against splits, since the parent 
> dir lingers on
>       // for some time until children references are deleted. HBCK 
> erroneously sees this as
>       // overlapping regions
> {code}
> However, this is undesirable.  when the reference files moved to a new 
> region, the parent region would have no daugher regions and hence it could be 
> cleaned up by CatalogJanitor.  This would create real inconsistency: 
> lingering reference files.  
> Another bad consequence is that we would merge splitted regions back to one.  
> Even it is undesirable, at least this would not cause more inconsistency.  
> this JIRA would not try to solve this unsplit issue, as it requires bigger 
> design change in Hbck.  
> This JIRA is  trying to address the potential lingering reference files 
> issue, as multiple customers using branch-1 faced this issue in production.  
> (workaround is that run major compaction on all split regions before run 
> HBCK, this could take longer time and have production impact).
> Attached is the log and modified unit test to repro the issue.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to