bharathv commented on a change in pull request #3204: URL: https://github.com/apache/hbase/pull/3204#discussion_r621850651
########## File path: hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java ########## @@ -641,8 +641,8 @@ public void removePeer(String id) { "and that replication on this peer may not be caught up. peerId=" + id); } for (ReplicationSourceInterface toRemove : srcToRemove) { - toRemove.terminate(terminateMessage); closeQueue(toRemove); + toRemove.terminate(terminateMessage); Review comment: Spoke with @sandeepvinayak. So the issue here was introduced by HBASE-25583. That patch introduced a peerRemoved() cleanup call from inside of ReplicationSourceShipperThread when any ZK call to update log position throws NoNodeException, meaning the peer was removed by some other process. (Ideally the peer should eventually get cleaned up due to a notification from ZK but we have repeatedly observed that there are occasional missed ZK notifications resulting in a weird ZK state, so the patch aggressively cleans up the peer state the first time it sees this exception). Now the problem is that peerRemoved() attempts to interrupt and cleanup the ReplicationSourceShipperThread from which it is being called and that is aborting the thread half way through and not performing the cleanup. Ideally that shouldn't happen but JVM has some weird semantics about thread interrupting itself, sigh. So the right way of fixing it is to keep the existing cleanup order as is (which is correct) but instead _not_ call peer cleanup routine from inside shipper/other replication threads. Currently the way the code is structured, there is no easy way (I can think of) to schedule a peer remove event without significant plumbing. A brute force way is to schedule it in the top level executor in ReplicationSourceManager, is there a better way? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org