Simon Zhou created CASSANDRA-13577:
--------------------------------------
Summary: Fix dynamic endpoint snitch for sub-millisecond use case
Key: CASSANDRA-13577
URL: https://issues.apache.org/jira/browse/CASSANDRA-13577
Project: Cassandra
Issue Type: Bug
Reporter: Simon Zhou
Assignee: Simon Zhou
Fix For: 3.0.x
This is a follow up of https://issues.apache.org/jira/browse/CASSANDRA-6908.
After disabling severity (CASSANDRA-11737/CASSANDRA-11738) in a few production
clusters, I observed that the scores for all the endpoints are mostly 0.0.
Through debugging, I found this is caused by that these clusters have p50
latency well below 1ms and the network latency is also <0.1ms (round trip). Be
noted that we use p50 sampled read latency and millisecond as time unit. That
means, if the latency is mostly below 1ms, the score will be 0. This is
definitely not something we want. To make DES work for these sub-millisecond
use cases, we should change the timeunit to at least microsecond, or even
nanosecond. I'll provide a patch soon.
Evidence of the p50 latency:
{code}
nodetool tablehistograms <keyspace> <table>
Percentile SSTables Write Latency Read Latency Partition Size
Cell Count
(micros) (micros) (bytes)
50% 2.00 35.43 454.83 20501
3
75% 2.00 42.51 654.95 29521
3
95% 3.00 182.79 943.13 61214
3
98% 4.00 263.21 1131.75 73457
3
99% 4.00 315.85 1358.10 88148
3
Min 0.00 9.89 11.87 61
3
Max 5.00 654.95 129557.75 943127
3
{code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]