[
https://issues.apache.org/jira/browse/HBASE-22839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Kyle Purtell resolved HBASE-22839.
-----------------------------------------
Fix Version/s: (was: 1.8.0)
Assignee: (was: Bin Shi)
Resolution: Won't Fix
> Make sure the batches within one region are shipped to the sink clusters in
> order (branch-1)
> --------------------------------------------------------------------------------------------
>
> Key: HBASE-22839
> URL: https://issues.apache.org/jira/browse/HBASE-22839
> Project: HBase
> Issue Type: Improvement
> Components: Replication
> Affects Versions: 1.3.4, 1.3.5
> Reporter: Bin Shi
> Priority: Major
>
> Problem Statement:
> In the cross-cluster replication validation, we found some cells in source
> and sink cluster can have the same row key, the same timestamp but different
> values. The happens when mutations with the same row key are submitted in
> batch without specifying the timestamp, and the same timestamp in the unit of
> millisecond is assigned at the time when they are committed to the WAL.
> When this happens, if the major compaction hasn’t happened yet and you scan
> the table, you can find some cells have the same row key, the same timestamps
> but different values, like the first three rows in the following table.
> |Row Key 1|CF0::Column 1|Timestatmp 1|Value 1|
> |Row Key 1|CF0::Column 1|Timestatmp 1|Value 2|
> |Row Key 1|CF0::Column 1|Timestatmp 1|Value 3|
> |Row Key 2|CF0::Column 1|Timestatmp 2|Value 4|
> |Row Key 3|CF0::Column 1|Timestatmp 4|Value 5|
> The ordering of the first three rows is indeterminate in the presence of the
> cross-replication, so after compaction, in the master cluster you will see
> “Row Key 1, CF0::Column1, Timestamp1” having the value 3, but in the slave
> cluster, you might see the cell having one of the three possible values 1, 2,
> 3, which results in data inconsistency issue between the master and slave
> clusters.
> Root Cause Analysis:
> In HBaseInterClusterReplicationEndpoint.createBatches() of branch-1.3, the
> WAL entries from the same region could be split into different batches
> according to replication RPC limit and these batches are shipped by
> ReplicationSource concurrently, so the batches for the same region could
> arrive at the sink in the slave clusters then apply to the region
> synchronously in indeterminate order.
> Solution:
> In HBase 3.0.0 and 2.1.0, [~Apache9]&[~openinx]&[~fenghh] provided Serial
> Replication HBASE-20046 which guarantees the order of pushing logs to slave
> clusters is same as the order of requests from client in the master cluster.
> It contains mainly two changes:
> # Recording the replication "barriers" in ZooKeeper to synchronize the
> replication across old/failed RS and new RS to provide strict ordering
> semantics even in the presence of region-move or RS failure.
> # Make sure the batches within one region are shipped to the slave clusters
> in order.
> The second part of change is exactly what we need and the minimal change to
> fix the issue in this JIRA.
> To fix the issue in this JIRA, we have two options:
> # Cherry-Pick HBASE-20046 to branch 1.3. Pros: It also fixes the data
> inconsistency issue when there is region-move or RS failure and help to
> reduce the noises in our cross-cluster replication/backup validation which is
> our ultimate goal. Cons: the change is big and I'm not sure for now whether
> the change is self-contained or it has other dependencies which need to port
> to branch 1.3 too; and we need longer time to validate and stabilize.
> # Port the minimal change or make the equivalent change as the second part
> of HBASE-20046 to make sure the batches within one region are shipped to the
> slave clusters in order."
> With limited knowledge about HBase Release Schedule and Process, I prefer
> option 2 because of cons of option 1, but I'm open to option 1 and other
> options. Thoughts?
--
This message was sent by Atlassian Jira
(v8.20.7#820007)