[
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13165085#comment-13165085
]
Chris Trezzo commented on HBASE-2611:
-------------------------------------
@J-D
LarsH and I were talking about another approach to region server replication
hlog queue failover yesterday, and I wanted to get some feedback on it.
Currently when handling a nodeDeleted event, the live region servers only
attempt to failover the node corresponding to the event. The nodeDeleted event
is only fired once, so to protect ourselves from orphaning the znode state of
the failed region server in a cascading failure scenario, we move the state to
the znode of the region server that is performing the failover. Since we don't
have an atomic way to move this state, it gets a little tricky.
Instead of this approach, we could have the region server attempt to failover
all failed region servers every time it receives a nodeDeleted event. For
example, the nodeDeleted method could go something like this: refresh the
region server list, get the list of region servers in the replication znode
structure, attempt to lock and failover any region server listed in the
replication znode structure that is not currently alive.
The same race to lock the region server znode will occur. Only one region
server will get the lock and handle the failover. Each NodeFailoverWorker that
gets started could simply operate on the original dead region server znode
structure. If the region server fails while preforming the failover, then both
the region servers will get picked up by another region server when the
nodeDeleted event for the second failure is fired. Locks would have to be
ephemeral nodes to prevent permanent locking of a region server when the
failover region server dies. Once the replication hlog queues are successfully
replicated, the znode for the dead region server can be deleted.
On the cons side, this approach makes the handling of a nodeDeleted event a
heavier weight operation.
On the pros side, it makes the failover code much simpler because we no longer
have to worry about moving the region server znode state around in zookeeper.
Thoughts always appreciated.
Thanks,
Chris
> Handle RS that fails while processing the failure of another one
> ----------------------------------------------------------------
>
> Key: HBASE-2611
> URL: https://issues.apache.org/jira/browse/HBASE-2611
> Project: HBase
> Issue Type: Sub-task
> Components: replication
> Reporter: Jean-Daniel Cryans
> Assignee: Jean-Daniel Cryans
>
> HBASE-2223 doesn't manage region servers that fail while doing the transfer
> of HLogs queues from other region servers that failed. Devise a reliable way
> to do it.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira