[ 
https://issues.apache.org/jira/browse/CASSANDRA-6866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oleg Anastasyev updated CASSANDRA-6866:
---------------------------------------

    Attachment: Read_Latency__2data___digest_vs_3_data__-_99_.png

Attached a 99% read latency graph of how making no digest read requests makes 
latency smaller under normal conditions. Traffic per node is 15-20 
(theoretically up to 30%) higher of course.

Latency is in nanos.

> Read repair path of quorum reads makes cluster to timeout all requests under 
> load
> ---------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-6866
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6866
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Oleg Anastasyev
>         Attachments: ReadRepairPathFixExample.txt, 
> Read_Latency__2data___digest_vs_3_data__-_99_.png
>
>
> Current implementation of read repair path for quorum reads is:
> 1. request data from 1 or 2 endpoints; request digest from others.
> 2. compare digests; throw DigestMismatchEx
> 3. request data form all contacted replicas with CL.ALL
> 4. prepare read repairs; send mutations
> 5. wait for all mutations to ack
> 6. retry read and prepare result.
> The main problem is in p. 3 ( still p. 5 is not good as well ). This is 
> because any of endpoints can go down but are not known to be down yet while 
> executing this.
> So, if you have a noticeable amount of read repair happening (shortly after 
> rack of nodes started up for example), waiting on CL.ALL and acks of RR 
> mutations of not-yet-known-to-be-down endpoints quickly occupy all client 
> thread pools on all nodes, so cluster becomes unavailable.
> This also make (otherwise successful) reads timeout from time to time even 
> under light load of the cluster, just because of temporary hiccups on net or 
> GC on a single endpoint.
> I do not have a generic solution for this; I fixed it in a way, which is  
> appropriate for us - using always speculative retry policy; patching it to 
> make data requests only (no digests) and do read repair on data at once (not 
> requesting them again). This way yet-not-known-to-be-down endpoints are just 
> not responing to data requests, so further read repair path does not contact 
> them at all.
> I attached my patch here for illustration.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to