Wellington Chevreuil created HBASE-20305:
--------------------------------------------

             Summary: Add option to SyncTable that skip deletes on target 
cluster
                 Key: HBASE-20305
                 URL: https://issues.apache.org/jira/browse/HBASE-20305
             Project: HBase
          Issue Type: Improvement
          Components: mapreduce
    Affects Versions: 2.0.0-alpha-4
            Reporter: Wellington Chevreuil
            Assignee: Wellington Chevreuil


We had a situation where two clusters with active-active replication got out of 
sync, but both had data that should be kept. The tables in question never have 
data deleted, but ingestion had happened on the two different clusters, some 
rows had been even updated.

In this scenario, a cell that is present in one of the table clusters should 
not be deleted, but replayed on the other. Also, for cells with same identifier 
but different values, the most recent value should be kept. Current version of 
SyncTable would not be applicable here, because it would simply copy the whole 
state from source to target, then losing any additional rows that might be only 
in target, as well as cell values that got most recent update. This could be 
solved by adding an option to skip deletes for SyncTable. This way, the 
additional cells not present on source would still be kept. For cells with same 
identifier but different values, it would just perform a Put for the cell 
version from source, but client scans would still fetch the most recent 
timestamp.

I'm attaching a patch with this additional option shortly. Please share your 
thoughts.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to