[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13165085#comment-13165085
 ] 

Chris Trezzo commented on HBASE-2611:
-------------------------------------

@J-D

LarsH and I were talking about another approach to region server replication 
hlog queue failover yesterday, and I wanted to get some feedback on it.

Currently when handling a nodeDeleted event, the live region servers only 
attempt to failover the node corresponding to the event. The nodeDeleted event 
is only fired once, so to protect ourselves from orphaning the znode state of 
the failed region server in a cascading failure scenario, we move the state to 
the znode of the region server that is performing the failover. Since we don't 
have an atomic way to move this state, it gets a little tricky.

Instead of this approach, we could have the region server attempt to failover 
all failed region servers every time it receives a nodeDeleted event. For 
example, the nodeDeleted method could go something like this: refresh the 
region server list, get the list of region servers in the replication znode 
structure, attempt to lock and failover any region server listed in the 
replication znode structure that is not currently alive.

The same race to lock the region server znode will occur. Only one region 
server will get the lock and handle the failover. Each NodeFailoverWorker that 
gets started could simply operate on the original dead region server znode 
structure. If the region server fails while preforming the failover, then both 
the region servers will get picked up by another region server when the 
nodeDeleted event for the second failure is fired. Locks would have to be 
ephemeral nodes to prevent permanent locking of a region server when the 
failover region server dies. Once the replication hlog queues are successfully 
replicated, the znode for the dead region server can be deleted.  

On the cons side, this approach makes the handling of a nodeDeleted event a 
heavier weight operation.

On the pros side, it makes the failover code much simpler because we no longer 
have to worry about moving the region server znode state around in zookeeper.

Thoughts always appreciated.

Thanks,
Chris
                
> Handle RS that fails while processing the failure of another one
> ----------------------------------------------------------------
>
>                 Key: HBASE-2611
>                 URL: https://issues.apache.org/jira/browse/HBASE-2611
>             Project: HBase
>          Issue Type: Sub-task
>          Components: replication
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>
> HBASE-2223 doesn't manage region servers that fail while doing the transfer 
> of HLogs queues from other region servers that failed. Devise a reliable way 
> to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to