Feng Honghua created HBASE-8751:
-----------------------------------

             Summary: Enable peer cluster to choose/change the 
ColumnFamilies/Tables it really want to replicate from a source cluster
                 Key: HBASE-8751
                 URL: https://issues.apache.org/jira/browse/HBASE-8751
             Project: HBase
          Issue Type: Improvement
          Components: Replication
            Reporter: Feng Honghua


Consider scenarios (all cf are with replication-scope=1):

1) cluster S has 3 tables, table A has cfA,cfB, table B has cfX,cfY, table C 
has cf1,cf2.

2) cluster X wants to replicate table A : cfA, table B : cfX and table C from 
cluster S.

3) cluster Y wants to replicate table B : cfY, table C : cf2 from cluster S.

Current replication implementation can't achieve this since it'll push the data 
of all the replicatable column-families from cluster S to all its peers, X/Y in 
this scenario.

This improvement provides a fine-grained replication theme which enable peer 
cluster to choose the column-families/tables they really want from the source 
cluster:

A). Set the table:cf-list for a peer when addPeer:
  hbase-shell> add_peer '3', "zk:1100:/hbase", "table1; table2:cf1,cf2; 
table3:cf2"

B). View the table:cf-list config for a peer using show_peer_tableCFs:
  hbase-shell> show_peer_tableCFs "1"

C). Change/set the table:cf-list for a peer using set_peer_tableCFs:
  hbase-shell> set_peer_tableCFs '2', "table1:cfX; table2:cf1; table3:cf1,cf2"

In this theme, replication-scope=1 only means a column-family CAN be replicated 
to other clusters, but only the 'table:cf-list list' determines WHICH cf/table 
will actually be replicated to a specific peer.

To provide back-compatibility, empty 'table:cf-list list' will replicate all 
replicatable cf/table. (this means we don't allow a peer which replicates 
nothing from a source cluster, we think it's reasonable: if replicating nothing 
why bother adding a peer?)

This improvement addresses the exact problem raised  by the first FAQ in 
"http://hbase.apache.org/replication.html":
  "GLOBAL means replicate? Any provision to replicate only to cluster X and not 
to cluster Y? or is that for later?
  Yes, this is for much later."

I also noticed somebody mentioned "replication-scope" as integer rather than a 
boolean is for such fine-grained replication purpose, but I think extending 
"replication-scope" can't achieve the same replication granularity flexibility 
as providing above per-peer replication configurations.

This improvement has been running smoothly in our production clusters (Xiaomi) 
for several months.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to