Hi, We have an issue on HBase cluster replication that delete operations between stop_replication and start_replication are not replicated correctly. We are using CDH3u2 and testing the cluster replication on coexisting clusters (each has 1 master and 5 regionservers).
Example: hbase(main):001:0> stop_replication hbase(main):002:0> put 'test', 'r1', 't:c1', 1 hbase(main):003:0> put 'test', 'r2', 't:', 2 hbase(main):004:0> delete 'test', 'r1', 't:c1' hbase(main):005:0> delete 'test', 'r2', 't:' hbase(main):006:0> start_replication After executing above commands in master cluster's hbase shell, scan returns empty results. hbase(main):007:0> scan 'test' ROW COLUMN+CELL 0 row(s) in 0.0180 seconds However, we have one cell in our slave cluster hbase(main):001:0> scan 'test' ROW COLUMN+CELL r1 column=t:c1, timestamp=1324893943268, value=1 1 row(s) in 0.3580 seconds In addition, we can resolve the issue either one of the below: 1. Modifying ReplicationSink.replicateEntries() not to batch puts (replicate wal one by one). 2. Changing the delete method in ReplicationSink.replicateEntries() from Delete.deleteColumn() to Delete.deleteColumns(). However, I'm not sure these modifications are correct and does not affect other parts. We appreciate any comments on this issue. Regards -- Teruyoshi Zenmyo <[email protected]>
