Paul Ayers created CASSANDRA-19049:
--------------------------------------
Summary: Speculative read retries and 3 replica responses driving
up latencies on CL ONE queries with RF 5 keyspace in C* 4.0.7
Key: CASSANDRA-19049
URL: https://issues.apache.org/jira/browse/CASSANDRA-19049
Project: Cassandra
Issue Type: Bug
Reporter: Paul Ayers
Attachments: iad8a-ra20-26a.log, pdx3a-ra1-15a.log,
tracepdx3a-ra1-15a.log
A Cassandra 4.0.7 cluster is experiencing very high cpu utilization and
extremely high latencies when certain partitions become hot.
This is occurring on a keyspace with a Replication Factor of 5 and a
Consistency Level of ONE. There are ~10 data drives per node, which is why
you'll see multiple sstables read in some traces because the data is
distributed round-robin among the drives.
All queries are single-partition queries.
I'm sure we haven't identified every partition that this occurs for, but at
least for the couple that we found, it seems we're hitting at least 3 of the 5
replicas in many cases and doing a lot of speculative retry, even though the CL
is ONE. We've kicked off some count queries just to capture a trace output for
a couple of the partitions that are known to cause issues, attached to the
Jira. When any of these partitions become hot, it pegs the cpu, drives up
latencies, and causes a lot of timeouts.
I assume this could be a bug related to the RF 5 keyspace as we'd probably have
seen this already with RF 3 keyspaces, but I have yet to test changing the RF
to 3 to see if that resolves the issue.
The schema for the table with the problematic partitions:
{code:java}
CREATE TABLE v2metadata.tag_values_fresh (
metric_name ascii,
tag_names ascii,
shard_id tinyint,
v2namespace ascii,
tag_values ascii,
metric_id blob,
timestamp_mins_last varint,
PRIMARY KEY ((metric_name, tag_names, shard_id), v2namespace, tag_values)
) WITH CLUSTERING ORDER BY (v2namespace ASC, tag_values ASC)
AND additional_write_policy = '99p'
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND cdc = false
AND comment = ''
AND compaction = {'class':
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
'max_threshold': '32', 'min_threshold': '4', 'unchecked_tombstone_compaction':
'true'}
AND compression = {'chunk_length_in_kb': '64', 'class':
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND default_time_to_live = 864000
AND extensions = {}
AND gc_grace_seconds = 10800
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair = 'BLOCKING'
AND speculative_retry = '99p';
{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]