[ 
https://issues.apache.org/jira/browse/HBASE-4799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151356#comment-13151356
 ] 

Max Lapan commented on HBASE-4799:
----------------------------------

Tried to attach files in reverse order, but form sorts them alphabetically, so 
not sure :).

We had this problem right after upgrade from 0.90.3-cdh3u1 to 0.90.4-cdh3u2 a 
month ago. I tested both patches today - they work. The main reason I separated 
them, is that 'temporary fix' is not needed in long term - it just tells 
janitor to remove leacked regions (we had about 15Tb of garbage on 50Tb table). 
And it could be dangerous in some situations, for example, when region started 
to split and RS crashed. So, only 0001 should be commited.
                
> Catalog Janitor logic bug causes region leackage
> ------------------------------------------------
>
>                 Key: HBASE-4799
>                 URL: https://issues.apache.org/jira/browse/HBASE-4799
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.4
>            Reporter: Max Lapan
>            Assignee: Max Lapan
>            Priority: Critical
>         Attachments: 0001-Fix-of-Regions-Leaks-problem-in-janitor.patch, 
> 0002-Temporary-fix-to-remove-leaked-regions.patch
>
>
> When region split takes a significant amount of time, CatalogJanitor can 
> cleanup one of SPLIT records, but left another in META. When another split 
> finish, janitor cleans left SPLIT record, but parent regions haven't removed 
> from FS and META not cleared.
> The race condition is follows:
> 1. region split started
> 2. one of regions splitted, i.e. A (have no reference storefiles) but other 
> (B) doesn't
> 3. janitor started and in routine checkDaughter removes SPLITA from meta, but 
> see that SPLITB has references and does nothing.
> 4. region B completes split
> 5. janitor wakes up, removes SPLITB, but see that there is no records for A 
> and does nothing again.
> Result - parent region hangs forever.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to