Simon Zhou created CASSANDRA-13261:
--------------------------------------
Summary: Improve speculative retry to avoid being overloaded
Key: CASSANDRA-13261
URL: https://issues.apache.org/jira/browse/CASSANDRA-13261
Project: Cassandra
Issue Type: Improvement
Reporter: Simon Zhou
Assignee: Simon Zhou
In CASSANDRA-13009, I was suggested to separate the 2nd part of my patch as an
improvement.
This is to avoid Cassandra being overloaded when using CUSTOM speculative retry
parameter. Steps to reason/repro this with 3.0.10:
1. Use custom speculative retry threshold like this:
cqlsh> alter TABLE to_repair1.users0 with speculative_retry='10ms';
2. SpeculatingReadExecutor will be used, according to this piece of code in
AbstractReadExecutor:
{code}
if (retry.equals(SpeculativeRetryParam.ALWAYS))
return new AlwaysSpeculatingReadExecutor(keyspace, cfs, command,
consistencyLevel, targetReplicas);
else // PERCENTILE or CUSTOM.
return new SpeculatingReadExecutor(keyspace, cfs, command,
consistencyLevel, targetReplicas);
{code}
3. When RF=3 and LOCAL_QUORUM is used, the below code (from
SpeculatingReadExecutor#maybeTryAdditionalReplicas) won't be able to protect
Cassandra from being overloaded, even though the inline comment suggests such
intention:
{code}
// no latency information, or we're overloaded
if (cfs.sampleLatencyNanos >
TimeUnit.MILLISECONDS.toNanos(command.getTimeout()))
return;
{code}
The reason is that cfs.sampleLatencyNanos is assigned as
retryPolicy.threshold() which is 10ms in step #1 above, at line 405 of
ColumnFamilyStore. However pretty often the timeout is the default one 5000ms.
As the name suggests, sampleLatencyNanos should be used to keep sampled
latency, not something configured "statically". My proposal:
a. Introduce option -Dcassandra.overload.threshold to allow customizing
overload threshold. The default threshold would be
DatabaseDescriptor.getRangeRpcTimeout().
b. Assign sampled P99 latency to cfs.sampleLatencyNanos. For overload
detection, we just compare cfs.sampleLatencyNanos with the customizable
threshold above.
c. Use retryDelayNanos (instead of cfs.sampleLatencyNanos) for waiting time
before retry (see line 282 of AbstractReadExecutor). This is the value from
table setting (PERCENTILE or CUSTOM).
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)