Wellington Chevreuil created HBASE-20305:
--------------------------------------------
Summary: Add option to SyncTable that skip deletes on target
cluster
Key: HBASE-20305
URL: https://issues.apache.org/jira/browse/HBASE-20305
Project: HBase
Issue Type: Improvement
Components: mapreduce
Affects Versions: 2.0.0-alpha-4
Reporter: Wellington Chevreuil
Assignee: Wellington Chevreuil
We had a situation where two clusters with active-active replication got out of
sync, but both had data that should be kept. The tables in question never have
data deleted, but ingestion had happened on the two different clusters, some
rows had been even updated.
In this scenario, a cell that is present in one of the table clusters should
not be deleted, but replayed on the other. Also, for cells with same identifier
but different values, the most recent value should be kept. Current version of
SyncTable would not be applicable here, because it would simply copy the whole
state from source to target, then losing any additional rows that might be only
in target, as well as cell values that got most recent update. This could be
solved by adding an option to skip deletes for SyncTable. This way, the
additional cells not present on source would still be kept. For cells with same
identifier but different values, it would just perform a Put for the cell
version from source, but client scans would still fetch the most recent
timestamp.
I'm attaching a patch with this additional option shortly. Please share your
thoughts.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)