[ 
https://issues.apache.org/jira/browse/HBASE-16135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15357506#comment-15357506
 ] 

Gary Helmling commented on HBASE-16135:
---------------------------------------

In ReplicationSourceManager.removePeer(), I see that you've reduced the scope 
of {{synchronized (this.replicationPeers)}}.  Does this open up a potential 
race condition with ReplicationSourceManager.recordLog()?  For example, we exit 
the synchronized block in removePeer, then run through the synchronized block 
in recordLog and see the peer as still connected, though removePeer will later 
remove it from the connected peer clusters.  I'm not very familiar with this 
code, just trying to understand the impact of this change in synchronization.

> PeerClusterZnode under rs of removed peer may never be deleted
> --------------------------------------------------------------
>
>                 Key: HBASE-16135
>                 URL: https://issues.apache.org/jira/browse/HBASE-16135
>             Project: HBase
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.5, 1.2.2, 0.98.20
>            Reporter: Duo Zhang
>            Assignee: Duo Zhang
>             Fix For: 2.0.0, 1.3.0, 1.4.0, 1.1.6, 0.98.21, 1.2.3
>
>         Attachments: HBASE-16135-0.98.patch, HBASE-16135-branch-1.1.patch, 
> HBASE-16135-branch-1.2.patch, HBASE-16135-branch-1.patch, 
> HBASE-16135-v1.patch, HBASE-16135.patch
>
>
> One of our cluster run out of space recently, and we found that the .oldlogs 
> directory had almost the same size as the data directory.
> Finally we found the problem is that, we removed a peer abort 3 months ago, 
> but there are still some replication queue znode under some rs nodes. This 
> prevents the deletion of .oldlogs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to