[ https://issues.apache.org/jira/browse/CASSANDRA-20743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17987030#comment-17987030 ]
Dmitry Konstantinov commented on CASSANDRA-20743: ------------------------------------------------- I see what you mean—yes, there's no difference whether it's local replica A or remote replica B that's slow; the issue will still occur. Regarding the dynamic snitch—yes, it's actually the reason why I have only 2% of requests are slow for a given replica in my example, rather than, say, 60%. This is because the dynamic snitch uses the 50th percentile (median) latency to score replicas, so higher-percentile latency issues aren't visible to it. > Inflation for speculative retry 99% threshold if one replica is slow > -------------------------------------------------------------------- > > Key: CASSANDRA-20743 > URL: https://issues.apache.org/jira/browse/CASSANDRA-20743 > Project: Apache Cassandra > Issue Type: Bug > Components: Consistency/Coordination > Reporter: Dmitry Konstantinov > Assignee: Dmitry Konstantinov > Priority: Normal > > I have executed a set of LOCAL_QUORUM read tests with 3 node Cassandra > cluster (4.1.4) when for one of the nodes a slow disk IO read is emulated > using a configured delay added to SSTable disk-level reads with a configured > probability. The purpose of these tests is to ensure that Cassandra does not > degrade a lot from latency point of view if a single replica is not healthy. > During such tests I observe an interesting behaviour: drift/inflation for > speculative retry threshold value. > We have a coordinator node, which is a replica as well. Let's assume we have > an injected read delay = 100ms with 2% probability within this node and 2 > other nodes are healthy. Usual read is executed from the local node + one of > the remote nodes. > Because of the introduced delay for 2% of requests we cross speculative retry > threshold value and run a speculative retry to the second remote replica. > The speculative retry threshold value is calculated as a +coordinator > latency+ 99% by default; in these 2% cases the coordinator latency is > actually equal to time to wait till speculative retry + time to execute the > request to a remote replica, so we contribute this value back to our > coordinator latency metric and actually create a degradation feedback loop: > while the 2% delay for the local disk reads is in place the speculative retry > threshold value will grow in steps = time to execute the request to a remote > replica, degrading more and more. > A possible WA is to use MIN(99p,Xms) speculative retry option introduced in > CASSANDRA-14293 but it is env specific, may depends on workload, so it can be > not so easy to define the right value for X.. > I have found the same issue reported for ScyllaDB - > [https://github.com/scylladb/scylladb/pull/8783] , to address it they started > to use replica read response times instead of a full coordinator read time > for speculative retry threshold value evaluation. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org