[
https://issues.apache.org/jira/browse/CASSANDRA-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206881#comment-13206881
]
Jeremy Hanna commented on CASSANDRA-3843:
-----------------------------------------
We'll be upgrading to 1.0.8 as soon as we can, but this seems like a
significant issue for anyone doing range scans - does it make sense to backport
to 0.8.x?
> Unnecessary ReadRepair request during RangeScan
> ------------------------------------------------
>
> Key: CASSANDRA-3843
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3843
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 1.0.0
> Reporter: Philip Andronov
> Assignee: Jonathan Ellis
> Fix For: 1.0.8
>
> Attachments: 3843-v2.txt, 3843.txt
>
>
> During reading with Quorum level and replication factor greater then 2,
> Cassandra sends at least one ReadRepair, even if there is no need to do that.
> With the fact that read requests await until ReadRepair will finish it slows
> down requsts a lot, up to the Timeout :(
> It seems that the problem has been introduced by the CASSANDRA-2494,
> unfortunately I have no enought knowledge of Cassandra internals to fix the
> problem and do not broke CASSANDRA-2494 functionality, so my report without a
> patch.
> Code explanations:
> {code:title=RangeSliceResponseResolver.java|borderStyle=solid}
> class RangeSliceResponseResolver {
> // ....
> private class Reducer extends
> MergeIterator.Reducer<Pair<Row,InetAddress>, Row>
> {
> // ....
> protected Row getReduced()
> {
> ColumnFamily resolved = versions.size() > 1
> ?
> RowRepairResolver.resolveSuperset(versions)
> : versions.get(0);
> if (versions.size() < sources.size())
> {
> for (InetAddress source : sources)
> {
> if (!versionSources.contains(source))
> {
>
> // [PA] Here we are adding null ColumnFamily.
> // later it will be compared with the "desired"
> // version and will give us "fake" difference which
> // forces Cassandra to send ReadRepair to a given
> source
> versions.add(null);
> versionSources.add(source);
> }
> }
> }
> // ....
> if (resolved != null)
>
> repairResults.addAll(RowRepairResolver.scheduleRepairs(resolved, table, key,
> versions, versionSources));
> // ....
> }
> }
> }
> {code}
> {code:title=RowRepairResolver.java|borderStyle=solid}
> public class RowRepairResolver extends AbstractRowResolver {
> // ....
> public static List<IAsyncResult> scheduleRepairs(ColumnFamily resolved,
> String table, DecoratedKey<?> key, List<ColumnFamily> versions,
> List<InetAddress> endpoints)
> {
> List<IAsyncResult> results = new
> ArrayList<IAsyncResult>(versions.size());
> for (int i = 0; i < versions.size(); i++)
> {
> // On some iteration we have to compare null and resolved which
> are obviously
> // not equals, so it will fire a ReadRequest, however it is not
> needed here
> ColumnFamily diffCf = ColumnFamily.diff(versions.get(i),
> resolved);
> if (diffCf == null)
> continue;
> // ....
> {code}
> Imagine the following situation:
> NodeA has X.1 // row X with the version 1
> NodeB has X.2
> NodeC has X.? // Unknown version, but because write was with Quorum it is 1
> or 2
> During the Quorum read from nodes A and B, Cassandra creates version 12 and
> send ReadRepair, so now nodes has the following content:
> NodeA has X.12
> NodeB has X.12
> which is correct, however Cassandra also will fire ReadRepair to NodeC. There
> is no need to do that, the next consistent read have a chance to be served by
> nodes {A, B} (no ReadRepair) or by pair {?, C} and in that case ReadRepair
> will be fired and brings nodeC to the consistent state
> Right now we are reading from the Index a lot and starting from some point in
> time we are getting TimeOutException because cluster is overloaded by the
> ReadRepairRequests *even* if all nodes has the same data :(
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira