[jira] [Commented] (HBASE-7280) TableNotFoundException thrown in peer cluster will incur endless retry for shipEdits, which in turn block following normal replication

Feng Honghua (JIRA) Sun, 09 Dec 2012 23:41:26 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-7280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527752#comment-13527752
 ]


Feng Honghua commented on HBASE-7280:
-------------------------------------

Thanks Jean-Daniel

But even REPLICATION_SCOPE is implemented, I don't think it's as flexible as 
adding per-peer table/CF configuration. Let me know if I'm wrong in 
understanding how REPLICATION_SCOPE is used as routing information: edits in 
master cluster will be shipped to all peer clusters whose peer_id-s are 
less_than_or_equal_to the REPLICATION_SCOPE. But what if a newly added peer 
want to replicate a table/CF with REPLICATION_SCOPE=A and another table/CF with 
REPLICATION=E, but doesn't want table/CF with REPLICATION_SCOPE=B/C/D 
(A>B>C>D>E here) ? Interpreting REPLICATION_SCOPE as bit-array and treating 
each bit as a peer_id has a similar problem. (At least we need to change 
REPLICATION_SCOPE if the original REPLICATION_SCOPE can't satisfy a later added 
peer's replication requirement)

Why REPLICATION_SCOPE isn't a rescue here is because in many cases the master 
cluster doesn't know exactly which peer cluster will / want to replicate which 
table/CF from it when it creates tables/CFs. On the contrast, each peer cluster 
knows exactly which tables/CFs to replicate from the master cluster when it 
adds itself as peer to the master cluster. By introducing table/CF list 
configuration when adding peer, we don't bother with figuring out in advance 
which(how many) peers can replicate the table/CF when creating them in master 
cluster, and we don't need to change the REPLICATION_SCOPE later on. 
ReplicationSourceManager just listens on the peer ZK nodes and adds a new 
ReplicationSource for the new peer with configured table/CF list, 
reads/filters/ships edits of the configured tables/CFs to the corresponding 
peer.

ReplicationSource also needs to listen on its peer ZK node for table/CF 
configuration change, which in turn influence which edits to ship to the peer 
from then on.

Any opinion?
                
> TableNotFoundException thrown in peer cluster will incur endless retry for 
> shipEdits, which in turn block following normal replication
> --------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-7280
>                 URL: https://issues.apache.org/jira/browse/HBASE-7280
>             Project: HBase
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 0.94.2
>            Reporter: Feng Honghua
>             Fix For: 0.94.4
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> in cluster replication, if the master cluster have 2 tables which have 
> column-family declared with replication scope = 1, and add a peer cluster 
> which has only 1 table with the same name as the master cluster, in the 
> ReplicationSource (thread in master cluster) for this peer, edits (logs) for 
> both tables will be shipped to the peer, the peer will fail applying the 
> edits due to TableNotFoundException, and this exception will also be 
> responsed to the original shipper (ReplicationSource in master cluster), and 
> the shipper will fall into an endless retry for shipping the failed edits 
> without proceeding to read the remained(newer) log files and to ship 
> following edits(maybe the normal, expected edit for the registered table). 
> the symptom looks like the TableNotFoundException incurs endless retry and 
> blocking normal table replication

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7280) TableNotFoundException thrown in peer cluster will incur endless retry for shipEdits, which in turn block following normal replication

Reply via email to