[jira] [Commented] (HBASE-22784) OldWALs not cleared in a replication slave cluster (cyclic replication bw 2 clusters)

Wellington Chevreuil (JIRA) Thu, 08 Aug 2019 16:48:19 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-22784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16903419#comment-16903419
 ]


Wellington Chevreuil commented on HBASE-22784:
----------------------------------------------

Third patch, fixing the problem causing previous build test failures. Issue was 
that tests were bulkloading edits targeted to replication at T1, then also 
bulkloading entries not targeted to replication at T2, with target cluster 
down. Because of my changes, we were always updating log position once a 
not-to-be replicated entry was coming. In this case, we had updated current log 
position to the entries from T2, but entries from T1 didn't get replicated yet, 
because destination cluster is down. Then, if source and target cluster are 
restarted, entries from T1 will never get replicated, because log position is 
now above those entries. 

Last patch added extra check to advance log position only if there's no entry 
currently pending replication. 

> OldWALs not cleared in a replication slave cluster (cyclic replication bw 2 
> clusters)
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-22784
>                 URL: https://issues.apache.org/jira/browse/HBASE-22784
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, Replication
>    Affects Versions: 1.4.9, 1.4.10
>            Reporter: Solvannan R M
>            Assignee: Wellington Chevreuil
>            Priority: Blocker
>             Fix For: 1.5.0
>
>         Attachments: HBASE-22784.branch-1.001.patch, 
> HBASE-22784.branch-1.002.patch, HBASE-22784.branch-1.003.patch
>
>
> When a cluster is passive (receiving edits only via replication) in a cyclic 
> replication setup of 2 clusters, OldWALs size keeps on growing. On analysing, 
> we observed the following behaviour.
>  # New entry is added to WAL (Edit replicated from other cluster).
>  # ReplicationSourceWALReaderThread(RSWALRT) reads and applies the configured 
> filters (due to cyclic replication setup, ClusterMarkingEntryFilter discards 
> new entry from other cluster).
>  # Entry is null, RSWALRT neither updates the batch stats 
> (WALEntryBatch.lastWalPosition) nor puts it in the entryBatchQueue.
>  # ReplicationSource thread is blocked in entryBachQueue.take().
>  # So ReplicationSource#updateLogPosition has never invoked and WAL file is 
> never cleared from ReplicationQueue.
>  # Hence LogCleaner on the master, doesn't deletes the oldWAL files from 
> hadoop.
> NOTE: When a new edit is added via hbase-client, ReplicationSource thread 
> process and clears the oldWAL files from replication queues and hence master 
> cleans up the WALs
> Please provide us a solution
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Commented] (HBASE-22784) OldWALs not cleared in a replication slave cluster (cyclic replication bw 2 clusters)

Reply via email to