Duo Zhang created HBASE-26029:
---------------------------------
Summary: It is not reliable to use nodeDeleted event to track
region server's death
Key: HBASE-26029
URL: https://issues.apache.org/jira/browse/HBASE-26029
Project: HBase
Issue Type: Bug
Reporter: Duo Zhang
Assignee: Duo Zhang
When implementing HBASE-26011, [~sunxin] pointed out an interesting scenario,
where a region server up and down between two sync requests, then we can not
know the death of the region server.
This is a valid point, and when thinking of a solution, I noticed that, the
current zk iplementation has the same problem. Notice that, a watcher on zk can
only be triggered once, so after zk triggers the watcher, and before you set a
new watcher, it is possible that a region server is up and down, and you will
miss the nodeDeleted event for this region server.
I think, the general approach here, which could works for both master based and
zk based replication tracker is that, we should not rely on the tracker to tell
you which region server is dead. Instead, we just provide the list of live
regionservers, and the upper layer should compare this list with the expected
list(for replication, the list should be gotten by listing replicators), to
detect the dead region servers.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)