[jira] [Commented] (HBASE-22784) OldWALs not cleared in a replication slave cluster (cyclic replication bw 2 clusters)

Solvannan R M (JIRA) Fri, 09 Aug 2019 00:57:03 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-22784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16903673#comment-16903673
 ]


Solvannan R M commented on HBASE-22784:
---------------------------------------

Hi [~wchevreuil],

Thanks for the patch! We will set up a test cluster and get back to you after 
trying it out.

Also, we were analysing the patches provided. We see that 
{{logPositionAndCleanOldLogs }}is called from both 
ReplicationSourceWALReaderThread and ReplicationSourceShipperThread and the 
shipment state is maintained by both the threads. Whereas originally it was 
handled only by the ReplicationSourceShipperThread, avoiding this state 
maintenance overhead at two places. We had been exploring the possibility of 
sending an empty batch to the shipper thread periodically which would handle 
the log postion update and cleanup logic organically. The flow being:

If the WAL reader thread does not have any entry batch (after passing through 
all the filters) after some configured time threshold, it can queue an empty 
batch, with the last read log position, to the entryBatchQueue. Now the 
ReplicationSourceShipperThread will read this empty batch and update it's 
position and invoke cleanup logic. 

Please let us know if this logic will lead to any inconsistencies. 

 

> OldWALs not cleared in a replication slave cluster (cyclic replication bw 2 
> clusters)
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-22784
>                 URL: https://issues.apache.org/jira/browse/HBASE-22784
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, Replication
>    Affects Versions: 1.4.9, 1.4.10
>            Reporter: Solvannan R M
>            Assignee: Wellington Chevreuil
>            Priority: Blocker
>             Fix For: 1.5.0
>
>         Attachments: HBASE-22784.branch-1.001.patch, 
> HBASE-22784.branch-1.002.patch, HBASE-22784.branch-1.003.patch
>
>
> When a cluster is passive (receiving edits only via replication) in a cyclic 
> replication setup of 2 clusters, OldWALs size keeps on growing. On analysing, 
> we observed the following behaviour.
>  # New entry is added to WAL (Edit replicated from other cluster).
>  # ReplicationSourceWALReaderThread(RSWALRT) reads and applies the configured 
> filters (due to cyclic replication setup, ClusterMarkingEntryFilter discards 
> new entry from other cluster).
>  # Entry is null, RSWALRT neither updates the batch stats 
> (WALEntryBatch.lastWalPosition) nor puts it in the entryBatchQueue.
>  # ReplicationSource thread is blocked in entryBachQueue.take().
>  # So ReplicationSource#updateLogPosition has never invoked and WAL file is 
> never cleared from ReplicationQueue.
>  # Hence LogCleaner on the master, doesn't deletes the oldWAL files from 
> hadoop.
> NOTE: When a new edit is added via hbase-client, ReplicationSource thread 
> process and clears the oldWAL files from replication queues and hence master 
> cleans up the WALs
> Please provide us a solution
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Commented] (HBASE-22784) OldWALs not cleared in a replication slave cluster (cyclic replication bw 2 clusters)

Reply via email to