Demai Ni created HBASE-9220:
-------------------------------

             Summary: An API(and shell command) to list tables replicated TO 
the current cluster 
                 Key: HBASE-9220
                 URL: https://issues.apache.org/jira/browse/HBASE-9220
             Project: HBase
          Issue Type: New Feature
          Components: Replication, shell
         Environment: clusters setup as Master and Slave for replication of 
tables
            Reporter: Demai Ni


This JIRA to track the continuous discussion following HBASE-8663, and 
hopefully surface a better way to handle the use case: 

an administrator or developer,  who has 'list table' access to a cluster, would 
like to know which tables/families are replicated to the cluster(i.e slave). so 
that he/she won't mess things up.

While HBASE-8663 covered the API to get the list of tables and families from 
current cluster(i.e Master). There is no conclusion on how to do the same for 
replicated tables TO the current cluster(i.e slave). Several ideas have been 
entertained during HBASE-8663's discussion, and summarized here: 

* *Idea 1*: on Slave cluster, use a new String attribute REPLICATION_MASTER to 
HColumnDescriptor to indicate this column is replicated from it. A check can be 
added to ensure the value of REPLICATION_MASTER is valid at the same of set. 
** problem 1) a slave can have more than one master(a minor one); 
** problem 2) the consistency is broken if the Master cluster 'remove_peer'(a 
major problem which request a synchronous call to the remote master/peer 
cluster)

* *Idea 2*: reuse REPLICATION_SCOPE, and give a new meaning for value '-1'. If 
a table is replicated to this cluster, its REPLICATION_SCOPE must be set to -1 
before a replication can occur
** problem 1) incompatible change. Currently the slave side table will look 
just like normal tables, the new change will request use to explicitly flag 
REPLICATION_SCOPE = -1
** problem 2) incompatible change. Currently any none-zero value of 
REPLICATION_SCOPE will be treated as if its value of 1(global replication). the 
change will impact the existing tables
** problem 3) value '-1' only tell user that the table is replicated to current 
cluster, won't be able to indicate the source/Master cluster

* *Idea 3*:  invent a new HColumnDescriptor attribute 'replication_peers', an 
array of ID. We can use positive ID for target-cluster, and negative ID for 
source-cluster, for example 
{code}
hbase(main):004:0> list_peers
 PEER_ID CLUSTER_KEY STATE
 1 Slave_A.hbase.com:2181:/hbase ENABLED
 2 Slave_B.hbase.com:2181:/hbase ENABLED
 3 Slave_Master_C.hbase.com:2181:/hbase ENABLED
-1 Master_A.hbase.com:2181:/hbase ENABLED
-2 Master_B.hbase.com:2181:/hbase ENABLED
-3 Slave_Master_C.hbase.com:2181:/hbase ENABLED
>describe table
't1_dn', {NAME => 'cf1', REPLICATION_PEERS => '1,2,3', ..}
't2_dn', {NAME => 'cf1', REPLICATION_PEERS => '-1,-2',..}
't3_dn', {NAME => 'cf1', REPLICATION_PEERS => '3,-3',..}

t1_dn#cf1 is replicated from this cluster, and its slave clusters are 
Slave_A,Slave_B and Slave_Master_C
t2_dn#cf1 is replicated to this cluster, and its master clusters are Master_A 
and Master_B
t3_dn#cf1 is setup as Master_Slave replication, with 
Slave_Master_C.hbase.com(while don't have to be the same cluster) 
{code}
** problem: similar as idea 1, and an improved version. A synchronous call can 
be implemented through the peer ID

* *Idea 4*: Replication central controller that resides outside of all the 
clusters. The controller will communicate with all clusters and keep info 
consistent, which can be a very good operational manager for users who have 10+ 
clusters to oversee, and other features(such as backup/restore) can leverage 
the framework
** problem: well, not really a problem per se, except the effort for the whole 
solution is pretty large and need some clean up work. For example, currently 
'add_peer' doesn't check the value, and we need to fix that first; and 
replication setup rely on manually create table on peer slave, we may like to 
ensure the same schema and do it automatically from Master cluster. 


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to