[ 
https://issues.apache.org/jira/browse/CASSANDRA-18766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17755167#comment-17755167
 ] 

Ivans Novikovs commented on CASSANDRA-18766:
--------------------------------------------

There are multiple clusters where I see this. All of them were upgraded from 
v4.0.7 to v4.1.3 using exactly same config file several weeks ago. Most of the 
clusters are 3 node clusters, with all nodes up and client apps using them 
constantly. Few clusters are larger in size.

I noticed higher than usual speculative retries just now and started 
investigate, historical metrics show that this changed exactly during upgrade 
and stays consistently this way, no other anomalies could be found so far.


While troubleshooting I tried to downgrade version on one of the nodes in one 
cluster first to v4.1.2 and then also to v4.0.7 and then back to v4.1.3. Test 
cluster does not have any significant load, so I used cassandra-stress to first 
write some test data with default settings and then to read it back on each 
version change. RF=3, CL=QUORUM. On test node it generates up to 6460 ops/s 
reads and ~7 ops/s speculative retries, but on v4.1.3 it jumps to 520 ops/s in 
specific test.

Thank you for describing process for me, I do not have much java knowledge, but 
will look trough that to try to understand what could be the cause in my case 
if it is not a bug.


Regarding node read latency I did not find anything unusual, but perhaps will 
do more tests and pay more attention to cassandra-stress output, not just 
metrics. Although additioanl speculative reads should affect anyway. If not 
find anything will probably look into setting up cluster with default settings, 
to see if I could reproduce it there.

> high speculative retries on v4.1.3
> ----------------------------------
>
>                 Key: CASSANDRA-18766
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18766
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Consistency/Coordination
>            Reporter: Ivans Novikovs
>            Priority: Normal
>             Fix For: 4.1.x
>
>
> There are up to 10+ times higher speculative retries for reads on 4.1.3 
> comparing to 4.0.7 and 4.1.2 when using QUORUM and default setting of 99p.
> On 4.1.3 after upgrade I see speculative retries for up to 35% of all reads 
> for specific table. Latency for reads is stable around 500 microseconds.
> java 1.8.0_382 is used



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to