[ 
https://issues.apache.org/jira/browse/HBASE-22330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Singh Chouhan reopened HBASE-22330:
--------------------------------------------

Hardening the test

> Backport HBASE-20724 (Sometimes some compacted storefiles are still opened 
> after region failover) to branch-1
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-22330
>                 URL: https://issues.apache.org/jira/browse/HBASE-22330
>             Project: HBase
>          Issue Type: Sub-task
>          Components: Compaction, regionserver
>    Affects Versions: 1.5.0, 1.4.9, 1.3.4
>            Reporter: Andrew Purtell
>            Assignee: Abhishek Singh Chouhan
>            Priority: Major
>             Fix For: 1.5.0, 1.3.5, 1.4.11
>
>         Attachments: HBASE-22330.branch-1.001.patch, 
> HBASE-22330.branch-1.002.patch, HBASE-22330.branch-1.3.001.patch
>
>
> There appears to be a race condition between close and split which when 
> combined with a side effect of HBASE-20704, leads to the parent region store 
> files getting archived and cleared while daughter regions still have 
> references to those parent region store files.
> Here is the timeline of events observed for an affected region:
>  # RS1 faces ZooKeeper connectivity issue for master node and starts shutting 
> itself down. As part of this it starts to close the store and clean up the 
> compacted files (File A)
>  # Master starts bulk assigning regions and assign parent region to RS2
>  # Region opens on RS2 and ends up opening compacted store file(s) (suspect 
> this is due to HBASE-20724)
>  # Now split happens and daughter regions open on RS2 and try to run a 
> compaction as part of post open
>  # Split request at this point is complete. However now archiving proceeds on 
> RS1 and ends up archiving the store file that is referenced by the daughter. 
> Compaction fails due to FileNotFoundException and all subsequent attempts to 
> open the region will fail until manual resolution.
> We think having HBASE-20724 would help in such situations since we won't end 
> up loading compacted store files in the first place. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to