[
https://issues.apache.org/jira/browse/HBASE-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177496#comment-13177496
]
ramkrishna.s.vasudevan commented on HBASE-5094:
-----------------------------------------------
Steps to reproduece the problem
->1) Load balancer started moving region(R1) from RS1 to Rs2.
->2)Rs2 has not yet updated in META table, before that RS1 goes down.
->3) So Servershutdownhandler started,
a) he first removes the region R1 from online list in master
b) and he sees R1 with RS1 as per META entry.
->4) That point RS2 completes the opening and updates the META.
-> 5)Call back comes to master, removes the region from RIT and not yet added
to onlineRegionlist in MAster.
->6)The step 3 continues and he sees addressinAM is null and also RIT is null
and so he goes with assignment.
-> 7) Now R1 is updated as RS3 in META and the operation gets completed. So
master also stores in online list that R1 is with RS3.
->8) Now RS3 goes down .
-> 9) Region R1 is getting assigned to RS2 from RS3 and RS2 says ALREADY_OPENED.
> The META can hold an entry for a region with a different server name from the
> one actually in the AssignmentManager thus making the region inaccessible.
> --------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-5094
> URL: https://issues.apache.org/jira/browse/HBASE-5094
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.92.0
> Reporter: ramkrishna.s.vasudevan
> Priority: Critical
> Attachments: HBASE-5094_1.patch
>
>
> {code}
> RegionState rit =
> this.services.getAssignmentManager().isRegionInTransition(e.getKey());
> ServerName addressFromAM = this.services.getAssignmentManager()
> .getRegionServerOfRegion(e.getKey());
> if (rit != null && !rit.isClosing() && !rit.isPendingClose()) {
> // Skip regions that were in transition unless CLOSING or
> // PENDING_CLOSE
> LOG.info("Skip assigning region " + rit.toString());
> } else if (addressFromAM != null
> && !addressFromAM.equals(this.serverName)) {
> LOG.debug("Skip assigning region "
> + e.getKey().getRegionNameAsString()
> + " because it has been opened in "
> + addressFromAM.getServerName());
> }
> {code}
> In ServerShutDownHandler we try to get the address in the AM. This address
> is initially null because it is not yet updated after the region was opened
> .i.e. the CAll back after node deletion is not yet done in the master side.
> But removal from RIT is completed on the master side. So this will trigger a
> new assignment.
> So there is a small window between the online region is actually added in to
> the online list and the ServerShutdownHandler where we check the existing
> address in AM.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira