[
https://issues.apache.org/jira/browse/HBASE-20305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wellington Chevreuil updated HBASE-20305:
-----------------------------------------
Status: In Progress (was: Patch Available)
> Add option to SyncTable that skip deletes on target cluster
> -----------------------------------------------------------
>
> Key: HBASE-20305
> URL: https://issues.apache.org/jira/browse/HBASE-20305
> Project: HBase
> Issue Type: Improvement
> Components: mapreduce
> Affects Versions: 2.0.0-alpha-4
> Reporter: Wellington Chevreuil
> Assignee: Wellington Chevreuil
> Priority: Minor
> Attachments: 0001-HBASE-20305.master.001.patch,
> HBASE-20305.master.002.patch
>
>
> We had a situation where two clusters with active-active replication got out
> of sync, but both had data that should be kept. The tables in question never
> have data deleted, but ingestion had happened on the two different clusters,
> some rows had been even updated.
> In this scenario, a cell that is present in one of the table clusters should
> not be deleted, but replayed on the other. Also, for cells with same
> identifier but different values, the most recent value should be kept.
> Current version of SyncTable would not be applicable here, because it would
> simply copy the whole state from source to target, then losing any additional
> rows that might be only in target, as well as cell values that got most
> recent update. This could be solved by adding an option to skip deletes for
> SyncTable. This way, the additional cells not present on source would still
> be kept. For cells with same identifier but different values, it would just
> perform a Put for the cell version from source, but client scans would still
> fetch the most recent timestamp.
> I'm attaching a patch with this additional option shortly. Please share your
> thoughts.
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)