[ 
https://issues.apache.org/jira/browse/HBASE-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177496#comment-13177496
 ] 

ramkrishna.s.vasudevan commented on HBASE-5094:
-----------------------------------------------

Steps to reproduece the problem
->1) Load balancer started moving region(R1) from RS1 to Rs2.
->2)Rs2 has not yet updated in META table, before that RS1 goes down.
->3) So Servershutdownhandler started,
        a) he first removes the region R1 from online list in master
       b)  and he sees R1 with RS1 as per META entry.
->4) That point RS2 completes the opening and updates the META.
-> 5)Call back comes to master, removes the region from RIT and not yet added 
to onlineRegionlist in MAster.
->6)The step 3 continues and he sees addressinAM is null and also RIT is null 
and so he goes with assignment.
-> 7) Now R1 is updated  as RS3 in META and the operation gets completed.  So 
master also stores in online list that R1 is with RS3.
->8) Now RS3 goes down .
-> 9) Region R1 is getting assigned to RS2 from RS3 and RS2 says ALREADY_OPENED.

                
> The META can hold an entry for a region with a different server name from the 
> one actually in the AssignmentManager thus making the region inaccessible.
> --------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5094
>                 URL: https://issues.apache.org/jira/browse/HBASE-5094
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.0
>            Reporter: ramkrishna.s.vasudevan
>            Priority: Critical
>         Attachments: HBASE-5094_1.patch
>
>
> {code}
> RegionState rit = 
> this.services.getAssignmentManager().isRegionInTransition(e.getKey());
>             ServerName addressFromAM = this.services.getAssignmentManager()
>                 .getRegionServerOfRegion(e.getKey());
>             if (rit != null && !rit.isClosing() && !rit.isPendingClose()) {
>               // Skip regions that were in transition unless CLOSING or
>               // PENDING_CLOSE
>               LOG.info("Skip assigning region " + rit.toString());
>             } else if (addressFromAM != null
>                 && !addressFromAM.equals(this.serverName)) {
>               LOG.debug("Skip assigning region "
>                     + e.getKey().getRegionNameAsString()
>                     + " because it has been opened in "
>                     + addressFromAM.getServerName());
>               }
> {code}
> In ServerShutDownHandler we try to get the address in the AM.  This address 
> is initially null because it is not yet updated after the region was opened 
> .i.e. the CAll back after node deletion is not yet done in the master side.
> But removal from RIT is completed on the master side.  So this will trigger a 
> new assignment.
> So there is a small window between the online region is actually added in to 
> the online list and the ServerShutdownHandler where we check the existing 
> address in AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to