[ 
https://issues.apache.org/jira/browse/HBASE-8599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13699719#comment-13699719
 ] 

Chris Trezzo commented on HBASE-8599:
-------------------------------------

The logic looks good. It would be nice if we had test coverage for these corner 
cases in the ReplicationSource loop, but it is a hard one to test.

Another approach would be to make the logPositionAndCleanOldLogs method clean 
all the old logs each time instead of just the one passed. Then we wouldn't 
need to add the extra corner case in the ReplicationSource run loop and it 
would ensure that there aren't any other corner cases that could bite us. In 
the common case you would still only be cleaning one log.

Thoughts?

Otherwise +1.
                
> HLogs in ZK are not cleaned up when replication lag is minimal
> --------------------------------------------------------------
>
>                 Key: HBASE-8599
>                 URL: https://issues.apache.org/jira/browse/HBASE-8599
>             Project: HBase
>          Issue Type: Improvement
>          Components: Replication
>    Affects Versions: 0.98.0, 0.94.7
>            Reporter: Varun Sharma
>            Assignee: Varun Sharma
>             Fix For: 0.98.0, 0.94.10
>
>         Attachments: 8599-0.94.patch, 8599-trunk.patch, 8599-trunk-v2.patch
>
>
> On a cluster with very low replication lag (as measured by ageOfLastShippedOp 
> on source), we found HLogs accumulating and not being cleaned up as new 
> WAL(s) are rolled.
> Each time, we call logPositionAndCleanOldLogs() to clean older logs whenever 
> the current WAL is not being written to any more - as suggested by 
> currentWALBeingWrittenTo being false. However, when lags are small, we may 
> hit the following block first and continue onto the next WAL without clearing 
> the old WAL(s)...
> ReplicationSource::run() {
>     if (readAllEntriesToReplicateOrNextFile(currentWALisBeingWrittenTo = 
> false)) {
>         // If we are here, then we advance to the next WAL without any 
> cleaning
>         // and close existing WAL
>         continue;
>     }
>     // Ship some edits and call logPositionAndCleanOldLogs
> }
> If we hit readAllEntriesToReplicateOrNextFile(false) only once - then older 
> logs are not cleaned out and persist in the zookeeper node since we simply 
> call "continue" and skip the subsequent logPositionAndCleanOldLogs call - if 
> its called more than once, we do end up clearing the old logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to