[
https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13565869#comment-13565869
]
Ian Varley commented on HBASE-7709:
-----------------------------------
Ah, touche. That particular arrangement is still fine (the resulting graph, (B
-> C -> B) -> A, doesn't have any bad cycles. However, you raise a good point;
if you start with:
A -> B -> C
and then later add "C -> B", you'd get:
A -> (B -> C -> B)
which is a bad cycle. And C has no way of knowing about A -> B; as a peer, you
only know who you replicate to, not who replicates to you.
A cluster could keep track of who is replicating TO it; in ReplicationSink, we
could track all the cluster IDs that have ever sent data in, and report that
through the "who do you replicate with" API. So then it would let you build a
full graph, because you get the backwards edges.
Of course, there's still plenty of catches: the race conditions, plus the
possibility that someone is set up to replicate to you, but they just haven't
sent any edits yet.
Meh. With this level of complication, a solution in the direction you're
talking about (adding info to the WAL) might be safer.
> Infinite loop possible in Master/Master replication
> ---------------------------------------------------
>
> Key: HBASE-7709
> URL: https://issues.apache.org/jira/browse/HBASE-7709
> Project: HBase
> Issue Type: Bug
> Components: Replication
> Reporter: Lars Hofhansl
> Fix For: 0.96.0, 0.94.6
>
>
> We just discovered the following scenario:
> # Cluster A and B are setup in master/master replication
> # By accident we had Cluster C replicate to Cluster A.
> Now all edit originating from C will be bouncing between A and B. Forever!
> The reason is that when the edit come in from C the cluster ID is already set
> and won't be reset.
> We have a couple of options here:
> # Optionally only support master/master (not cycles of more than two
> clusters). In that case we can always reset the cluster ID in the
> ReplicationSource. That means that now cycles > 2 will have the data cycle
> forever. This is the only option that requires no changes in the HLog format.
> # Instead of a single cluster id per edit maintain a (unordered) set of
> cluster id that have seen this edit. Then in ReplicationSource we drop any
> edit that the sink has seen already. The is the cleanest approach, but it
> might need a lot of data stored per edit if there are many clusters involved.
> # Maintain a configurable counter of the maximum cycle side we want to
> support. Could default to 10 (even maybe even just). Store a hop-count in the
> WAL and the ReplicationSource increases that hop-count on each hop. If we're
> over the max, just drop the edit.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira