[
https://issues.apache.org/jira/browse/HBASE-9158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13734308#comment-13734308
]
Hudson commented on HBASE-9158:
-------------------------------
FAILURE: Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #658 (See
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/658/])
HBASE-9158 Serious bug in cyclic replication (larsh: rev 1512089)
*
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java
*
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestMasterReplication.java
> Serious bug in cyclic replication
> ---------------------------------
>
> Key: HBASE-9158
> URL: https://issues.apache.org/jira/browse/HBASE-9158
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.98.0, 0.95.1, 0.94.10
> Reporter: Lars Hofhansl
> Assignee: Lars Hofhansl
> Priority: Critical
> Fix For: 0.98.0, 0.95.2, 0.94.11
>
> Attachments: 9158-0.94.txt, 9158-0.94-v2.txt, 9158-0.94-v3.txt,
> 9158-0.94-v4.txt, 9158-trunk-v1.txt, 9158-trunk-v2.txt, 9158-trunk-v3.txt,
> 9158-trunk-v4.txt
>
>
> While studying the code for HBASE-7709, I found a serious bug in the current
> cyclic replication code. The problem is here in HRegion.doMiniBatchMutation:
> {code}
> Mutation first = batchOp.operations[firstIndex].getFirst();
> txid = this.log.appendNoSync(regionInfo,
> this.htableDescriptor.getName(),
> walEdit, first.getClusterId(), now, this.htableDescriptor);
> {code}
> Now note that edits replicated from remote cluster and local edits might
> interleave in the WAL, we might also receive edit from multiple remote
> clusters. Hence that <walEdit> might have edits from many clusters in it, but
> all are just labeled with the clusterId of the first Mutation.
> Fixing this in doMiniBatchMutation seems tricky to do efficiently (imagine we
> get a batch with cluster1, cluster2, cluster1, cluster2, ..., in that case
> each edit would have to be its own batch). The coprocessor handling would
> also be difficult.
> The other option is create batches of Puts grouped by the cluster id in
> ReplicationSink.replicateEntries(...), this is not as general, but equally
> correct. This is the approach I would favor.
> Lastly this is very hard to verify in a unittest.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira