[
https://issues.apache.org/jira/browse/HBASE-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13180214#comment-13180214
]
Ming Ma commented on HBASE-5094:
--------------------------------
Ram, it turns out that my patch which is based on an earlier snapshot of 0.92
code base is quite similar to the fix in HBase-4899. In fact, how the bug is
reproed is also similar. Still it seems like there is a really small time
window where both my fix and HBase-4899 won't cover. Below code refers to the
new code added in HBase-4899.
T1. ServerShutdownHandler. the check for "if (rit != null && !rit.isClosing()
&& !rit.isPendingClose()" return false as the region is still in closing state.
It is actually closed by the RS; Master's state is "closing" due to the delay
in ZK notification.
T2. Right after the above check, ZK notification happens and Master starts the
opening of the region as requested by load balancer.
T3. "else { this.services.getAssignmentManager().assign(e.getKey(), true); }"
is called for another assignment.
> The META can hold an entry for a region with a different server name from the
> one actually in the AssignmentManager thus making the region inaccessible.
> --------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-5094
> URL: https://issues.apache.org/jira/browse/HBASE-5094
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.92.0
> Reporter: ramkrishna.s.vasudevan
> Assignee: ramkrishna.s.vasudevan
> Priority: Critical
> Attachments: HBASE-5094_1.patch
>
>
> {code}
> RegionState rit =
> this.services.getAssignmentManager().isRegionInTransition(e.getKey());
> ServerName addressFromAM = this.services.getAssignmentManager()
> .getRegionServerOfRegion(e.getKey());
> if (rit != null && !rit.isClosing() && !rit.isPendingClose()) {
> // Skip regions that were in transition unless CLOSING or
> // PENDING_CLOSE
> LOG.info("Skip assigning region " + rit.toString());
> } else if (addressFromAM != null
> && !addressFromAM.equals(this.serverName)) {
> LOG.debug("Skip assigning region "
> + e.getKey().getRegionNameAsString()
> + " because it has been opened in "
> + addressFromAM.getServerName());
> }
> {code}
> In ServerShutDownHandler we try to get the address in the AM. This address
> is initially null because it is not yet updated after the region was opened
> .i.e. the CAll back after node deletion is not yet done in the master side.
> But removal from RIT is completed on the master side. So this will trigger a
> new assignment.
> So there is a small window between the online region is actually added in to
> the online list and the ServerShutdownHandler where we check the existing
> address in AM.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira