[
https://issues.apache.org/jira/browse/HBASE-27214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17580858#comment-17580858
]
Duo Zhang commented on HBASE-27214:
-----------------------------------
After reconsideration, I think AddPeerProcedure could have race with
ReplicationLogCleaner. Think of this scenario:
1. ReplicationLogCleaner starts, and we load the current replication peers.
2. AddPeerProcedure added a new replication peer in disabled state.
3. The region server rolls a new wal file, and all other replication peers have
finished replicate it, so the ReplicationLogCleaner decides to delete the wal
file.
4. But actually, the peer added in #2 still needs to replicate this wal file,
and cause trouble.
The problem here is that, in #3 we do not know there is a new peer added in #2.
We could load the replication peers every time when we want to decide whether
to delete file, but since there is no fence, theoretically after we reload the
replication peers, it is still possible that, a new peer is added, and a wal
roll happens and then lead us to the same situation.
A possible solution here is to add a barrier to not allow AddPeerProcedure and
ReplicationLogCleaner run at the same time. Will try to see if there are other
better solutions.
Thanks.
> Implement the new replication hfile/log cleaner
> -----------------------------------------------
>
> Key: HBASE-27214
> URL: https://issues.apache.org/jira/browse/HBASE-27214
> Project: HBase
> Issue Type: Sub-task
> Components: master, Replication
> Reporter: Duo Zhang
> Assignee: LiangJun He
> Priority: Major
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)