[ 
https://issues.apache.org/jira/browse/HBASE-22839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reid Chan updated HBASE-22839:
------------------------------
    Fix Version/s:     (was: 1.7.0)
                   1.8.0

> Make sure the batches within one region are shipped to the sink clusters in 
> order (branch-1)
> --------------------------------------------------------------------------------------------
>
>                 Key: HBASE-22839
>                 URL: https://issues.apache.org/jira/browse/HBASE-22839
>             Project: HBase
>          Issue Type: Improvement
>          Components: Replication
>    Affects Versions: 1.3.4, 1.3.5
>            Reporter: Bin Shi
>            Assignee: Bin Shi
>            Priority: Major
>             Fix For: 1.8.0
>
>
> Problem Statement:
> In the cross-cluster replication validation, we found some cells in source 
> and sink cluster can have the same row key, the same timestamp but different 
> values. The happens when mutations with the same row key are submitted in 
> batch without specifying the timestamp, and the same timestamp in the unit of 
> millisecond is assigned at the time when they are committed to the WAL. 
> When this happens, if the major compaction hasn’t happened yet and you scan 
> the table, you can find some cells have the same row key, the same timestamps 
> but different values, like the first three rows in the following table.
> |Row Key 1|CF0::Column 1|Timestatmp 1|Value 1|
> |Row Key 1|CF0::Column 1|Timestatmp 1|Value 2|
> |Row Key 1|CF0::Column 1|Timestatmp 1|Value 3|
> |Row Key 2|CF0::Column 1|Timestatmp 2|Value 4|
> |Row Key 3|CF0::Column 1|Timestatmp 4|Value 5|
> The ordering of the first three rows is indeterminate in the presence of the 
> cross-replication, so after compaction, in the master cluster you will see 
> “Row Key 1, CF0::Column1, Timestamp1” having the value 3, but in the slave 
> cluster, you might see the cell having one of the three possible values 1, 2, 
> 3, which results in data inconsistency issue between the master and slave 
> clusters.
> Root Cause Analysis:
> In HBaseInterClusterReplicationEndpoint.createBatches() of branch-1.3, the 
> WAL entries from the same region could be split into different batches 
> according to replication RPC limit and these batches are shipped by 
> ReplicationSource concurrently, so the batches for the same region could 
> arrive at the sink in the slave clusters then apply to the region 
> synchronously in indeterminate order.
> Solution:
> In HBase 3.0.0 and 2.1.0, [~Apache9]&[~openinx]&[~fenghh] provided Serial 
> Replication HBASE-20046 which guarantees the order of pushing logs to slave 
> clusters is same as the order of requests from client in the master cluster. 
> It contains mainly two changes:
>  # Recording the replication "barriers" in ZooKeeper to synchronize the 
> replication across old/failed RS and new RS to provide strict ordering 
> semantics even in the presence of region-move or RS failure.
>  # Make sure the batches within one region are shipped to the slave clusters 
> in order.
> The second part of change is exactly what we need and the minimal change to 
> fix the issue in this JIRA.
> To fix the issue in this JIRA, we have two options:
>  # Cherry-Pick HBASE-20046 to branch 1.3. Pros: It also fixes the data 
> inconsistency issue when there is region-move or RS failure and help to 
> reduce the noises in our cross-cluster replication/backup validation which is 
> our ultimate goal. Cons: the change is big and I'm not sure for now whether 
> the change is self-contained or it has other dependencies which need to port 
> to branch 1.3 too; and we need longer time to validate and stabilize.  
>  # Port the minimal change or make the equivalent change as the second part 
> of HBASE-20046 to make sure the batches within one region are shipped to the 
> slave clusters in order."
> With limited knowledge about HBase Release Schedule and Process, I prefer 
> option 2 because of cons of option 1, but I'm open to option 1 and other 
> options. Thoughts? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to