Simon Zhou created CASSANDRA-13261:
--------------------------------------

             Summary: Improve speculative retry to avoid being overloaded
                 Key: CASSANDRA-13261
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13261
             Project: Cassandra
          Issue Type: Improvement
            Reporter: Simon Zhou
            Assignee: Simon Zhou


In CASSANDRA-13009, I was suggested to separate the 2nd part of my patch as an 
improvement.

This is to avoid Cassandra being overloaded when using CUSTOM speculative retry 
parameter. Steps to reason/repro this with 3.0.10:
1. Use custom speculative retry threshold like this:
cqlsh> alter TABLE to_repair1.users0 with speculative_retry='10ms';

2. SpeculatingReadExecutor will be used, according to this piece of code in 
AbstractReadExecutor:
{code}
        if (retry.equals(SpeculativeRetryParam.ALWAYS))
            return new AlwaysSpeculatingReadExecutor(keyspace, cfs, command, 
consistencyLevel, targetReplicas);
        else // PERCENTILE or CUSTOM.
            return new SpeculatingReadExecutor(keyspace, cfs, command, 
consistencyLevel, targetReplicas);
{code}

3. When RF=3 and LOCAL_QUORUM is used, the below code (from 
SpeculatingReadExecutor#maybeTryAdditionalReplicas) won't be able to protect 
Cassandra from being overloaded, even though the inline comment suggests such 
intention:

{code}
            // no latency information, or we're overloaded
            if (cfs.sampleLatencyNanos > 
TimeUnit.MILLISECONDS.toNanos(command.getTimeout()))
                return;
{code}

The reason is that cfs.sampleLatencyNanos is assigned as 
retryPolicy.threshold() which is 10ms in step #1 above, at line 405 of 
ColumnFamilyStore. However pretty often the timeout is the default one 5000ms.

As the name suggests, sampleLatencyNanos should be used to keep sampled 
latency, not something configured "statically". My proposal:
a. Introduce option -Dcassandra.overload.threshold to allow customizing 
overload threshold. The default threshold would be 
DatabaseDescriptor.getRangeRpcTimeout().
b. Assign sampled P99 latency to cfs.sampleLatencyNanos. For overload 
detection, we just compare cfs.sampleLatencyNanos with the customizable 
threshold above.
c. Use retryDelayNanos (instead of cfs.sampleLatencyNanos) for waiting time 
before retry (see line 282 of AbstractReadExecutor). This is the value from 
table setting (PERCENTILE or CUSTOM).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to