[ 
https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151831#comment-13151831
 ] 

ramkrishna.s.vasudevan commented on HBASE-4739:
-----------------------------------------------

@Gao
Good work... One suggestion
For the case 
M_ZK_REGION_PENDING_CLOSE under processRegionsInTransition() i think
{code}
if (isOnDeadServer(regionInfo, deadServers) &&
            (data.getOrigin() == null || 
!serverManager.isServerOnline(data.getOrigin()))) {
          // If was on dead server, its closed now. Force to OFFLINE and this
          // will get it reassigned if appropriate
          forceOffline(regionInfo, data);
{code}
is needed as we do in RS_ZK_CLOSING

Because assume a case where master creates the node in PENDING_CLOSE but before 
sending the call to RS master goes down.  Before the master comes up RS is also 
down.  Now as per the current patch in case of processRegionInTransition() for 
M_ZK_REGION_PENDING_CLOSE we will once again try to unassign() which will not 
happen. so better to force it Offline.  What do you think Gao? 
I still feel we can give just a RPC call to RS instead of handling a new state 
because in RS any way close call cannot reach it twice.  
                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4739_trial2.patch, HBASE-4739_Trunk.patch, 
> HBASE-4739_Trunk_V2.patch, HBASE-4739_trial.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when 
> the master died it had just created the RIT znode for a region but didn't 
> tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
> out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. 
> state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for 
> too long, this should eventually complete or the server will expire, doing 
> nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare 
> unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to