[
https://issues.apache.org/jira/browse/CASSANDRA-18766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17755167#comment-17755167
]
Ivans Novikovs commented on CASSANDRA-18766:
--------------------------------------------
There are multiple clusters where I see this. All of them were upgraded from
v4.0.7 to v4.1.3 using exactly same config file several weeks ago. Most of the
clusters are 3 node clusters, with all nodes up and client apps using them
constantly. Few clusters are larger in size.
I noticed higher than usual speculative retries just now and started
investigate, historical metrics show that this changed exactly during upgrade
and stays consistently this way, no other anomalies could be found so far.
While troubleshooting I tried to downgrade version on one of the nodes in one
cluster first to v4.1.2 and then also to v4.0.7 and then back to v4.1.3. Test
cluster does not have any significant load, so I used cassandra-stress to first
write some test data with default settings and then to read it back on each
version change. RF=3, CL=QUORUM. On test node it generates up to 6460 ops/s
reads and ~7 ops/s speculative retries, but on v4.1.3 it jumps to 520 ops/s in
specific test.
Thank you for describing process for me, I do not have much java knowledge, but
will look trough that to try to understand what could be the cause in my case
if it is not a bug.
Regarding node read latency I did not find anything unusual, but perhaps will
do more tests and pay more attention to cassandra-stress output, not just
metrics. Although additioanl speculative reads should affect anyway. If not
find anything will probably look into setting up cluster with default settings,
to see if I could reproduce it there.
> high speculative retries on v4.1.3
> ----------------------------------
>
> Key: CASSANDRA-18766
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18766
> Project: Cassandra
> Issue Type: Bug
> Components: Consistency/Coordination
> Reporter: Ivans Novikovs
> Priority: Normal
> Fix For: 4.1.x
>
>
> There are up to 10+ times higher speculative retries for reads on 4.1.3
> comparing to 4.0.7 and 4.1.2 when using QUORUM and default setting of 99p.
> On 4.1.3 after upgrade I see speculative retries for up to 35% of all reads
> for specific table. Latency for reads is stable around 500 microseconds.
> java 1.8.0_382 is used
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]