[ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151831#comment-13151831 ]
ramkrishna.s.vasudevan commented on HBASE-4739: ----------------------------------------------- @Gao Good work... One suggestion For the case M_ZK_REGION_PENDING_CLOSE under processRegionsInTransition() i think {code} if (isOnDeadServer(regionInfo, deadServers) && (data.getOrigin() == null || !serverManager.isServerOnline(data.getOrigin()))) { // If was on dead server, its closed now. Force to OFFLINE and this // will get it reassigned if appropriate forceOffline(regionInfo, data); {code} is needed as we do in RS_ZK_CLOSING Because assume a case where master creates the node in PENDING_CLOSE but before sending the call to RS master goes down. Before the master comes up RS is also down. Now as per the current patch in case of processRegionInTransition() for M_ZK_REGION_PENDING_CLOSE we will once again try to unassign() which will not happen. so better to force it Offline. What do you think Gao? I still feel we can give just a RPC call to RS instead of handling a new state because in RS any way close call cannot reach it twice. > Master dying while going to close a region can leave it in transition forever > ----------------------------------------------------------------------------- > > Key: HBASE-4739 > URL: https://issues.apache.org/jira/browse/HBASE-4739 > Project: HBase > Issue Type: Bug > Affects Versions: 0.90.4 > Reporter: Jean-Daniel Cryans > Assignee: gaojinchao > Priority: Minor > Fix For: 0.92.0, 0.94.0, 0.90.5 > > Attachments: 4739_trial2.patch, HBASE-4739_Trunk.patch, > HBASE-4739_Trunk_V2.patch, HBASE-4739_trial.patch > > > I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when > the master died it had just created the RIT znode for a region but didn't > tell the RS to close it yet. > When the master restarted it saw the znode and started printing this: > {quote} > 2011-11-03 00:02:49,130 INFO > org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed > out: TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. > state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948 > 2011-11-03 00:02:49,130 INFO > org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for > too long, this should eventually complete or the server will expire, doing > nothing > {quote} > It's never going to happen, and it's blocking balancing. > I'm marking this as minor since I believe this situation is pretty rare > unless you hit other bugs while trying out stuff to root bugs out. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira