[ https://issues.apache.org/jira/browse/HBASE-9888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814686#comment-13814686 ]
santosh banerjee commented on HBASE-9888: ----------------------------------------- Did you consider writing a replication controller to be deployed on the sink cluster ? The idea is to implement a edits filter plugin as a ReplicationSinkService implementation that can be defined in the hbase-site.xml as follows? <property> <name>hbase.replication.source.service</name> <value>com......replication.ReplicationController</value> </property> You can consider extending the default implementation org.apache.hadoop.hbase.replication.regionserver.Replication of ReplicationSinkService, and just override the replicateLogEntries() method to apply your filtering logic. For your specific usecase, your sink cluster needs to know when it was added as a peer, and then ,assuming the two clusters are time-synced, the custom sink service can filter the edits based on this timestamp, leaving out the ones that were created earlier than the timestamp. > HBase replicates edits written before the replication peer is created > --------------------------------------------------------------------- > > Key: HBASE-9888 > URL: https://issues.apache.org/jira/browse/HBASE-9888 > Project: HBase > Issue Type: Bug > Reporter: Dave Latham > > When creating a new replication peer the ReplicationSourceManager enqueues > the currently open HLog to the ReplicationSource to ship to the destination > cluster. The ReplicationSource starts at the beginning of the HLog and ships > over any pre-existing writes. > A workaround is to roll all the HLogs before enabling replication. > A little background for how it affected us - we were migrating one cluster in > a master-master pair. I.e. transitioning from A <\-> B to B <-> C. After > shutting down writes from A -> B we enabled writes from C -> B. However, > this replicated some earlier writes that were in C's HLogs that had > originated in A. Since we were running a version of HBase before HBASE-7709 > those writes then got caught in a infinite replication cycle and bringing > down region servers OOM because of HBASE-9865. > However, in general, if one wants to manage what data gets replicated, one > wouldn't expect that potentially very old writes would be included when > setting up a new replication link. -- This message was sent by Atlassian JIRA (v6.1#6144)