[
https://issues.apache.org/jira/browse/HBASE-9888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814686#comment-13814686
]
santosh banerjee commented on HBASE-9888:
-----------------------------------------
Did you consider writing a replication controller to be deployed on the sink
cluster ?
The idea is to implement a edits filter plugin as a ReplicationSinkService
implementation that can be defined in the hbase-site.xml as follows?
<property>
<name>hbase.replication.source.service</name>
<value>com......replication.ReplicationController</value>
</property>
You can consider extending the default implementation
org.apache.hadoop.hbase.replication.regionserver.Replication of
ReplicationSinkService, and just override the replicateLogEntries() method to
apply your filtering logic. For your specific usecase, your sink cluster needs
to know when it was added as a peer, and then ,assuming the two clusters are
time-synced, the custom sink service can filter the edits based on this
timestamp, leaving out the ones that were created earlier than the timestamp.
> HBase replicates edits written before the replication peer is created
> ---------------------------------------------------------------------
>
> Key: HBASE-9888
> URL: https://issues.apache.org/jira/browse/HBASE-9888
> Project: HBase
> Issue Type: Bug
> Reporter: Dave Latham
>
> When creating a new replication peer the ReplicationSourceManager enqueues
> the currently open HLog to the ReplicationSource to ship to the destination
> cluster. The ReplicationSource starts at the beginning of the HLog and ships
> over any pre-existing writes.
> A workaround is to roll all the HLogs before enabling replication.
> A little background for how it affected us - we were migrating one cluster in
> a master-master pair. I.e. transitioning from A <\-> B to B <-> C. After
> shutting down writes from A -> B we enabled writes from C -> B. However,
> this replicated some earlier writes that were in C's HLogs that had
> originated in A. Since we were running a version of HBase before HBASE-7709
> those writes then got caught in a infinite replication cycle and bringing
> down region servers OOM because of HBASE-9865.
> However, in general, if one wants to manage what data gets replicated, one
> wouldn't expect that potentially very old writes would be included when
> setting up a new replication link.
--
This message was sent by Atlassian JIRA
(v6.1#6144)