[ https://issues.apache.org/jira/browse/HBASE-4899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160677#comment-13160677 ]
chunhui shen commented on HBASE-4899: ------------------------------------- Testing result on my QA environment {code} Results : Tests run: 1175, Failures: 0, Errors: 0, Skipped: 9 [INFO] [INFO] --- maven-surefire-plugin:2.11-TRUNK-HBASE-2:test (secondPartTestsExecution) @ hbase --- [INFO] Tests are skipped. [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 1:44:10.984s [INFO] Finished at: Thu Dec 01 14:10:34 CST 2011 [INFO] Final Memory: 35M/380M [INFO] ------------------------------------------------------------------------ {code} please check! > Region would be assigned twice easily with continually killing server and > moving region in testing environment > --------------------------------------------------------------------------------------------------------------- > > Key: HBASE-4899 > URL: https://issues.apache.org/jira/browse/HBASE-4899 > Project: HBase > Issue Type: Bug > Affects Versions: 0.92.1 > Reporter: chunhui shen > Assignee: chunhui shen > Priority: Critical > Attachments: hbase-4899.patch, hbase-4899v2.patch, hbase-4899v3.patch > > > Before assigning region in ServerShutdownHandler#process, it will check > whether region is in RIT, > however, this checking doesn't work as the excepted in the following case: > 1.move region A from server B to server C > 2.kill server B > 3.start server B immediately > Let's see what happen in the code for the above case > {code} > for step1: > 1.1 server B close the region A, > 1.2 master setOffline for region > A,(AssignmentManager#setOffline:this.regions.remove(regionInfo)) > 1.3 server C start to open region A.(Not completed) > for step3: > master ServerShutdownHandler#process() for server B > { > .. > splitlog() > ... > List<RegionState> regionsInTransition = > this.services.getAssignmentManager() > .processServerShutdown(this.serverName); > ... > Skip regions that were in transition unless CLOSING or PENDING_CLOSE > ... > assign region > } > {code} > In fact, when running > ServerShutdownHandler#process()#this.services.getAssignmentManager().processServerShutdown(this.serverName), > region A is in RIT (step1.3 not completed), but the return List<RegionState> > regionsInTransition doesn't contain it, because region A has removed from > AssignmentManager.regions by AssignmentManager#setOffline in step 1.2 > Therefore, region A will be assigned twice. > Actually, one server killed and started twice will also easily cause region > assigned twice. > Exclude the above reason, another probability : > when execute ServerShutdownHandler#process()#MetaReader.getServerUserRegions > ,region is included which is in RIT now. > But after completing MetaReader.getServerUserRegions, the region has been > opened in other server and is not in RIT now. > In our testing environment where balancing,moving and killing are executed > periodly, assigning region twice often happens, and it is hateful because it > will affect other test cases. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira