[
https://issues.apache.org/jira/browse/CASSANDRA-6866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13935582#comment-13935582
]
Oleg Anastasyev edited comment on CASSANDRA-6866 at 3/14/14 8:38 PM:
---------------------------------------------------------------------
Attached a 99% read latency graph of how making no digest read requests makes
latency smaller even under normal conditions. Traffic per node is 15-20
(theoretically up to 30%) higher of course.
Latency is in nanos.
was (Author: m0nstermind):
Attached a 99% read latency graph of how making no digest read requests makes
latency smaller under normal conditions. Traffic per node is 15-20
(theoretically up to 30%) higher of course.
Latency is in nanos.
> Read repair path of quorum reads makes cluster to timeout all requests under
> load
> ---------------------------------------------------------------------------------
>
> Key: CASSANDRA-6866
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6866
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Oleg Anastasyev
> Attachments: ReadRepairPathFixExample.txt,
> Read_Latency__2data___digest_vs_3_data__-_99_.png
>
>
> Current implementation of read repair path for quorum reads is:
> 1. request data from 1 or 2 endpoints; request digest from others.
> 2. compare digests; throw DigestMismatchEx
> 3. request data form all contacted replicas with CL.ALL
> 4. prepare read repairs; send mutations
> 5. wait for all mutations to ack
> 6. retry read and prepare result.
> The main problem is in p. 3 ( still p. 5 is not good as well ). This is
> because any of endpoints can go down but are not known to be down yet while
> executing this.
> So, if you have a noticeable amount of read repair happening (shortly after
> rack of nodes started up for example), waiting on CL.ALL and acks of RR
> mutations of not-yet-known-to-be-down endpoints quickly occupy all client
> thread pools on all nodes, so cluster becomes unavailable.
> This also make (otherwise successful) reads timeout from time to time even
> under light load of the cluster, just because of temporary hiccups on net or
> GC on a single endpoint.
> I do not have a generic solution for this; I fixed it in a way, which is
> appropriate for us - using always speculative retry policy; patching it to
> make data requests only (no digests) and do read repair on data at once (not
> requesting them again). This way yet-not-known-to-be-down endpoints are just
> not responing to data requests, so further read repair path does not contact
> them at all.
> I attached my patch here for illustration.
--
This message was sent by Atlassian JIRA
(v6.2#6252)