[
https://issues.apache.org/jira/browse/HBASE-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13770304#comment-13770304
]
Feng Honghua commented on HBASE-8751:
-------------------------------------
bq. Well, even in that case the variable is being read and modified in
different threads, and should therefore qualify for some form of
synchronization.Hence declaring it volatile sounds necessary.What do you say?
Maybe some of my below understanding is wrong, point me out if any :)
1. In ReplicationSource, for each hlog entry to push,
zkHelper.getTableCFs(peerId) is called to get the 'current' tableCFs map, not
use a local variable, so always the variable in main heap, not the thread local
one, is used. (this is what volatile concerned, right?)
2. tableCFs is a map(a reference), and the event thread of zookeeper changes it
in this way:
Map<String, List<String>> curMap = new HashMap<String, List<String>>();
...//parse the new tableCFsConfig in zk node and populate to curMap
this.tableCFs = curMap;
When the ReplicationSource thread reads the tableCFs at any timepoint during
ReplicationPeer updates tableCFs as above, it gets map reference to either the
old tableCFs or the new tableCFs(but either is consistent), not one with
inconsistent map contents. (which does occur if ReplicationPeer updates
tableCFs by remove/add map entry to the tableCFs, not by switching tableCFs
reference as a whole)
Actually when I wrote this piece of code, I did hesitate if need add
synchronization here, according above arguments I thought no need.
Please help clarify if my understanding is wrong, thanks:)
> Enable peer cluster to choose/change the ColumnFamilies/Tables it really want
> to replicate from a source cluster
> ----------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-8751
> URL: https://issues.apache.org/jira/browse/HBASE-8751
> Project: HBase
> Issue Type: Improvement
> Components: Replication
> Reporter: Feng Honghua
> Attachments: HBASE-8751-0.94-V0.patch
>
>
> Consider scenarios (all cf are with replication-scope=1):
> 1) cluster S has 3 tables, table A has cfA,cfB, table B has cfX,cfY, table C
> has cf1,cf2.
> 2) cluster X wants to replicate table A : cfA, table B : cfX and table C from
> cluster S.
> 3) cluster Y wants to replicate table B : cfY, table C : cf2 from cluster S.
> Current replication implementation can't achieve this since it'll push the
> data of all the replicatable column-families from cluster S to all its peers,
> X/Y in this scenario.
> This improvement provides a fine-grained replication theme which enable peer
> cluster to choose the column-families/tables they really want from the source
> cluster:
> A). Set the table:cf-list for a peer when addPeer:
> hbase-shell> add_peer '3', "zk:1100:/hbase", "table1; table2:cf1,cf2;
> table3:cf2"
> B). View the table:cf-list config for a peer using show_peer_tableCFs:
> hbase-shell> show_peer_tableCFs "1"
> C). Change/set the table:cf-list for a peer using set_peer_tableCFs:
> hbase-shell> set_peer_tableCFs '2', "table1:cfX; table2:cf1; table3:cf1,cf2"
> In this theme, replication-scope=1 only means a column-family CAN be
> replicated to other clusters, but only the 'table:cf-list list' determines
> WHICH cf/table will actually be replicated to a specific peer.
> To provide back-compatibility, empty 'table:cf-list list' will replicate all
> replicatable cf/table. (this means we don't allow a peer which replicates
> nothing from a source cluster, we think it's reasonable: if replicating
> nothing why bother adding a peer?)
> This improvement addresses the exact problem raised by the first FAQ in
> "http://hbase.apache.org/replication.html":
> "GLOBAL means replicate? Any provision to replicate only to cluster X and
> not to cluster Y? or is that for later?
> Yes, this is for much later."
> I also noticed somebody mentioned "replication-scope" as integer rather than
> a boolean is for such fine-grained replication purpose, but I think extending
> "replication-scope" can't achieve the same replication granularity
> flexibility as providing above per-peer replication configurations.
> This improvement has been running smoothly in our production clusters
> (Xiaomi) for several months.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira