[ 
https://issues.apache.org/jira/browse/HBASE-9888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814686#comment-13814686
 ] 

santosh banerjee commented on HBASE-9888:
-----------------------------------------

Did you consider writing a replication controller to be deployed on the sink 
cluster ?
The idea is to implement a edits filter plugin as a ReplicationSinkService 
implementation that can be defined in the hbase-site.xml as follows? 

<property>
<name>hbase.replication.source.service</name>
<value>com......replication.ReplicationController</value>
</property>

You can consider extending the default implementation 
org.apache.hadoop.hbase.replication.regionserver.Replication of 
ReplicationSinkService, and just override the replicateLogEntries() method to 
apply your filtering logic. For your specific usecase, your sink cluster needs 
to know when it was added as a peer, and then ,assuming the two clusters are 
time-synced, the custom sink service can filter the edits based on this 
timestamp, leaving out the ones that were created earlier than the timestamp.


> HBase replicates edits written before the replication peer is created
> ---------------------------------------------------------------------
>
>                 Key: HBASE-9888
>                 URL: https://issues.apache.org/jira/browse/HBASE-9888
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Dave Latham
>
> When creating a new replication peer the ReplicationSourceManager enqueues 
> the currently open HLog to the ReplicationSource to ship to the destination 
> cluster.  The ReplicationSource starts at the beginning of the HLog and ships 
> over any pre-existing writes.
> A workaround is to roll all the HLogs before enabling replication.
> A little background for how it affected us - we were migrating one cluster in 
> a master-master pair.  I.e. transitioning from A <\-> B to B <-> C.  After 
> shutting down writes from A -> B we enabled writes from C -> B.  However, 
> this replicated some earlier writes that were in C's HLogs that had 
> originated in A.  Since we were running a version of HBase before HBASE-7709 
> those writes then got caught in a infinite replication cycle and bringing 
> down region servers OOM because of HBASE-9865.
> However, in general, if one wants to manage what data gets replicated, one 
> wouldn't expect that potentially very old writes would be included when 
> setting up a new replication link.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to