[ 
https://issues.apache.org/jira/browse/HBASE-14889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15089473#comment-15089473
 ] 

Stephen Yuan Jiang commented on HBASE-14889:
--------------------------------------------

[~pankaj2461], how is your testing on the patch? Could you post the patch here 
so that we can make progress?

> Region stuck in transition in OPEN state indefinitely in corner scenario
> ------------------------------------------------------------------------
>
>                 Key: HBASE-14889
>                 URL: https://issues.apache.org/jira/browse/HBASE-14889
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.98.14, 1.0.2
>            Reporter: Abhishek Singh Chouhan
>            Assignee: Pankaj Kumar
>
> During a failure scenario when a RS dies and the bulk assigner(BA) is 
> assigning its regions to others RSs, if another RS dies(on which some regions 
> are being moved) on which region is in pending open state, we end up in a 
> situation where two bulk assigners try to assign the same region on the Same 
> RS.
> The following happened - 
> 1. While one BA was opening the region the second one sees it in pending open 
> state, retries and calls unassign(...) thereby sending CLOSE RPC to the RS.
> 2. The RS meanwhile has already opened the region, hence changing the znode 
> state to RS_ZK_REGION_OPENED which triggers event on master.
> 3. On master after the unassign is successful we go on to deleting the znode, 
> change region state to Pending open and send open RPC to RS.
> 4. The earlier triggered event now sees the state as Pending open and happily 
> changes it to OPEN, but is unable to delete the znode which by this time is 
> not in RS_ZK_REGION_OPENED state but is in M_ZK_REGION_OFFLINE state. Hence 
> the region remains in transition in the OPEN state.
> 5. RS goes on to changing the znode states and successfully opens the region 
> (changes znode state to RS_ZK_REGION_OPENED)
> 6. This again triggers event on master but this time since the state is OPEN 
> the folloing code path is taken 
> {noformat}
> case RS_ZK_REGION_OPENED:
>           // Should see OPENED after OPENING but possible after PENDING_OPEN.
>           if (regionState == null
>               || !regionState.isPendingOpenOrOpeningOnServer(sn)) {
>             LOG.warn("Received OPENED for " + prettyPrintedRegionName
>               + " from " + sn + " but the region isn't PENDING_OPEN/OPENING 
> here: "
>               + regionStates.getRegionState(encodedName));
>             if (regionState != null) {
>               // Close it without updating the internal region states,
>               // so as not to create double assignments in unlucky scenarios
>               // mentioned in OpenRegionHandler#process
>               unassign(regionState.getRegion(), null, -1, null, false, sn);
>             }
>             return;
>           }
> {noformat}
> We call unassign here with transitionInZK=false and state=null
> 7. RS closes the region but doesn't update the ZK, also state is not changed 
> in master. Region remains in transition in OPEN state, when its actually 
> closed. We have to restart the RS post which it opens correctly on some other 
> RS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to