[
https://issues.apache.org/jira/browse/HBASE-14889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Pankaj Kumar updated HBASE-14889:
---------------------------------
Affects Version/s: 1.0.2
> Region stuck in transition in OPEN state indefinitely in corner scenario
> ------------------------------------------------------------------------
>
> Key: HBASE-14889
> URL: https://issues.apache.org/jira/browse/HBASE-14889
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.98.14, 1.0.2
> Reporter: Abhishek Singh Chouhan
> Assignee: Pankaj Kumar
>
> During a failure scenario when a RS dies and the bulk assigner(BA) is
> assigning its regions to others RSs, if another RS dies(on which some regions
> are being moved) on which region is in pending open state, we end up in a
> situation where two bulk assigners try to assign the same region on the Same
> RS.
> The following happened -
> 1. While one BA was opening the region the second one sees it in pending open
> state, retries and calls unassign(...) thereby sending CLOSE RPC to the RS.
> 2. The RS meanwhile has already opened the region, hence changing the znode
> state to RS_ZK_REGION_OPENED which triggers event on master.
> 3. On master after the unassign is successful we go on to deleting the znode,
> change region state to Pending open and send open RPC to RS.
> 4. The earlier triggered event now sees the state as Pending open and happily
> changes it to OPEN, but is unable to delete the znode which by this time is
> not in RS_ZK_REGION_OPENED state but is in M_ZK_REGION_OFFLINE state. Hence
> the region remains in transition in the OPEN state.
> 5. RS goes on to changing the znode states and successfully opens the region
> (changes znode state to RS_ZK_REGION_OPENED)
> 6. This again triggers event on master but this time since the state is OPEN
> the folloing code path is taken
> {noformat}
> case RS_ZK_REGION_OPENED:
> // Should see OPENED after OPENING but possible after PENDING_OPEN.
> if (regionState == null
> || !regionState.isPendingOpenOrOpeningOnServer(sn)) {
> LOG.warn("Received OPENED for " + prettyPrintedRegionName
> + " from " + sn + " but the region isn't PENDING_OPEN/OPENING
> here: "
> + regionStates.getRegionState(encodedName));
> if (regionState != null) {
> // Close it without updating the internal region states,
> // so as not to create double assignments in unlucky scenarios
> // mentioned in OpenRegionHandler#process
> unassign(regionState.getRegion(), null, -1, null, false, sn);
> }
> return;
> }
> {noformat}
> We call unassign here with transitionInZK=false and state=null
> 7. RS closes the region but doesn't update the ZK, also state is not changed
> in master. Region remains in transition in OPEN state, when its actually
> closed. We have to restart the RS post which it opens correctly on some other
> RS.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)