[ 
https://issues.apache.org/jira/browse/HBASE-7101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13491386#comment-13491386
 ] 

Bing Jiang commented on HBASE-7101:
-----------------------------------

If znode change state from SPLITTING to SPLIT, please assert HRegionServer 
SplitTransaction wait a long time that HMaster-AssignmentManager has finished 
the clean-up work.Maybe it can add a ZooKeeperWatcher in the SplitTransaction.
  
                
> HBase stuck in Region SPLIT 
> ----------------------------
>
>                 Key: HBASE-7101
>                 URL: https://issues.apache.org/jira/browse/HBASE-7101
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.1
>            Reporter: Bing Jiang
>             Fix For: 0.96.0, 0.94.4
>
>
> I found this issue from a zknode which has existed for a long time in the 
> unassigned parent.And HMaster report warnning log increasingly.The loop log 
> is at below. 
> WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 
> 1a1c950ad45812d7b4b9b90ebf268468 not found on server 
> sev0040,60020,1350378314041; failed processing
> WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for 
> region 1a1c950ad45812d7b4b9b90ebf268468 from server 
> sev0040,60020,1350378314041 but it doesn't exist anymore, probably already 
> processed its split
> WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 
> 1a1c950ad45812d7b4b9b90ebf268468 not found on server 
> gs-dpo-sev0040,60020,1350378314041; failed processing
> WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for 
> region 1a1c950ad45812d7b4b9b90ebf268468 from server 
> sev0040,60020,1350378314041 but it doesn't exist anymore, probably already 
> processed its split
> we use Hbase-0.92.1, and I trace back to the source code. HMaster 
> AssignmentManager have already deleted the SPLIT_Region in its memory 
> structure,but HRegionServer SplitTransaction has found the 
> unassigned/parent-node existed in a transient state, precisely 
> SplitTransaction executes tickleNodeSplit to update a new version a little 
> later than  AssignmentManager deleting unassigned/parent-znode. After 
> updating a version of the znode, it will intrigue the handleRegion operation 
> again, however, AssignmentManager assert that the RegionState in Memory has 
> been deleted, and transaction goes into a retry loop.
> In the SplitTransaction, transitionZKNode will retry tickleNodeSplit after 
> sleeping 100ms. In my opinion, if the time is much longger than 100ms, all 
> the operation from AssignmentManagement will finish off completely.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to