[
https://issues.apache.org/jira/browse/HBASE-7280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527752#comment-13527752
]
Feng Honghua commented on HBASE-7280:
-------------------------------------
Thanks Jean-Daniel
But even REPLICATION_SCOPE is implemented, I don't think it's as flexible as
adding per-peer table/CF configuration. Let me know if I'm wrong in
understanding how REPLICATION_SCOPE is used as routing information: edits in
master cluster will be shipped to all peer clusters whose peer_id-s are
less_than_or_equal_to the REPLICATION_SCOPE. But what if a newly added peer
want to replicate a table/CF with REPLICATION_SCOPE=A and another table/CF with
REPLICATION=E, but doesn't want table/CF with REPLICATION_SCOPE=B/C/D
(A>B>C>D>E here) ? Interpreting REPLICATION_SCOPE as bit-array and treating
each bit as a peer_id has a similar problem. (At least we need to change
REPLICATION_SCOPE if the original REPLICATION_SCOPE can't satisfy a later added
peer's replication requirement)
Why REPLICATION_SCOPE isn't a rescue here is because in many cases the master
cluster doesn't know exactly which peer cluster will / want to replicate which
table/CF from it when it creates tables/CFs. On the contrast, each peer cluster
knows exactly which tables/CFs to replicate from the master cluster when it
adds itself as peer to the master cluster. By introducing table/CF list
configuration when adding peer, we don't bother with figuring out in advance
which(how many) peers can replicate the table/CF when creating them in master
cluster, and we don't need to change the REPLICATION_SCOPE later on.
ReplicationSourceManager just listens on the peer ZK nodes and adds a new
ReplicationSource for the new peer with configured table/CF list,
reads/filters/ships edits of the configured tables/CFs to the corresponding
peer.
ReplicationSource also needs to listen on its peer ZK node for table/CF
configuration change, which in turn influence which edits to ship to the peer
from then on.
Any opinion?
> TableNotFoundException thrown in peer cluster will incur endless retry for
> shipEdits, which in turn block following normal replication
> --------------------------------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-7280
> URL: https://issues.apache.org/jira/browse/HBASE-7280
> Project: HBase
> Issue Type: Bug
> Components: Replication
> Affects Versions: 0.94.2
> Reporter: Feng Honghua
> Fix For: 0.94.4
>
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> in cluster replication, if the master cluster have 2 tables which have
> column-family declared with replication scope = 1, and add a peer cluster
> which has only 1 table with the same name as the master cluster, in the
> ReplicationSource (thread in master cluster) for this peer, edits (logs) for
> both tables will be shipped to the peer, the peer will fail applying the
> edits due to TableNotFoundException, and this exception will also be
> responsed to the original shipper (ReplicationSource in master cluster), and
> the shipper will fall into an endless retry for shipping the failed edits
> without proceeding to read the remained(newer) log files and to ship
> following edits(maybe the normal, expected edit for the registered table).
> the symptom looks like the TableNotFoundException incurs endless retry and
> blocking normal table replication
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira