[ 
https://issues.apache.org/jira/browse/HBASE-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179164#comment-13179164
 ] 

stack commented on HBASE-5094:
------------------------------

Thanks for the fix Ram.  It does feel though like fix should have been done on 
the master-side rather than out on regionserver.   In master we have some hope 
of making sense of whats going on out on the cluster; this seems like an issue 
in master where the shutdown thread and balancer thread are fighting over a 
particular region' state.  And this 'fix' only addresses case where region 
reassign arrives at the server that already has it open.

Could we make it such that only one thread can transition a region at a time?

Could shutdown handler not have noticed this state?

{code}
->6)The step 3 continues and he sees addressinAM is null and also RIT is null 
and so he goes with assignment.
{code}

                
> The META can hold an entry for a region with a different server name from the 
> one actually in the AssignmentManager thus making the region inaccessible.
> --------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5094
>                 URL: https://issues.apache.org/jira/browse/HBASE-5094
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.0
>            Reporter: ramkrishna.s.vasudevan
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>         Attachments: HBASE-5094_1.patch
>
>
> {code}
> RegionState rit = 
> this.services.getAssignmentManager().isRegionInTransition(e.getKey());
>             ServerName addressFromAM = this.services.getAssignmentManager()
>                 .getRegionServerOfRegion(e.getKey());
>             if (rit != null && !rit.isClosing() && !rit.isPendingClose()) {
>               // Skip regions that were in transition unless CLOSING or
>               // PENDING_CLOSE
>               LOG.info("Skip assigning region " + rit.toString());
>             } else if (addressFromAM != null
>                 && !addressFromAM.equals(this.serverName)) {
>               LOG.debug("Skip assigning region "
>                     + e.getKey().getRegionNameAsString()
>                     + " because it has been opened in "
>                     + addressFromAM.getServerName());
>               }
> {code}
> In ServerShutDownHandler we try to get the address in the AM.  This address 
> is initially null because it is not yet updated after the region was opened 
> .i.e. the CAll back after node deletion is not yet done in the master side.
> But removal from RIT is completed on the master side.  So this will trigger a 
> new assignment.
> So there is a small window between the online region is actually added in to 
> the online list and the ServerShutdownHandler where we check the existing 
> address in AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to