[
https://issues.apache.org/jira/browse/HBASE-16135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15357506#comment-15357506
]
Gary Helmling commented on HBASE-16135:
---------------------------------------
In ReplicationSourceManager.removePeer(), I see that you've reduced the scope
of {{synchronized (this.replicationPeers)}}. Does this open up a potential
race condition with ReplicationSourceManager.recordLog()? For example, we exit
the synchronized block in removePeer, then run through the synchronized block
in recordLog and see the peer as still connected, though removePeer will later
remove it from the connected peer clusters. I'm not very familiar with this
code, just trying to understand the impact of this change in synchronization.
> PeerClusterZnode under rs of removed peer may never be deleted
> --------------------------------------------------------------
>
> Key: HBASE-16135
> URL: https://issues.apache.org/jira/browse/HBASE-16135
> Project: HBase
> Issue Type: Bug
> Components: Replication
> Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.5, 1.2.2, 0.98.20
> Reporter: Duo Zhang
> Assignee: Duo Zhang
> Fix For: 2.0.0, 1.3.0, 1.4.0, 1.1.6, 0.98.21, 1.2.3
>
> Attachments: HBASE-16135-0.98.patch, HBASE-16135-branch-1.1.patch,
> HBASE-16135-branch-1.2.patch, HBASE-16135-branch-1.patch,
> HBASE-16135-v1.patch, HBASE-16135.patch
>
>
> One of our cluster run out of space recently, and we found that the .oldlogs
> directory had almost the same size as the data directory.
> Finally we found the problem is that, we removed a peer abort 3 months ago,
> but there are still some replication queue znode under some rs nodes. This
> prevents the deletion of .oldlogs.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)