Dan Kinder created CASSANDRA-13923:
--------------------------------------
Summary: Flushers blocked due to many SSTables
Key: CASSANDRA-13923
URL: https://issues.apache.org/jira/browse/CASSANDRA-13923
Project: Cassandra
Issue Type: Bug
Components: Compaction, Local Write-Read Paths
Environment: Cassandra 3.11.0
Centos 6 (downgraded JNA)
64GB RAM
12-disk JBOD
Reporter: Dan Kinder
Attachments: cassandra-jstack-readstage.txt, cassandra-jstack.txt
This started on the mailing list and I'm not 100% sure of the root cause, feel
free to re-title if needed.
I just upgraded Cassandra from 2.2.6 to 3.11.0. Within a few hours of serving
traffic, thread pools begin to back up and grow pending tasks indefinitely.
This happens to multiple different stages (Read, Mutation) and consistently
builds pending tasks for MemtablePostFlush and MemtableFlushWriter.
Using jstack shows that there is blocking going on when trying to call
getCompactionCandidates, which seems to happen on flush. We have fairly large
nodes that have ~15,000 SSTables per node, all LCS.
I seems like this can cause reads to get blocked because they try to acquire a
read lock when calling shouldDefragment.
And writes, of course, block once we can't allocate anymore memtables, because
flushes are backed up.
We did not have this problem in 2.2.6, so it seems like there is some
regression causing it to be incredibly slow trying to do calls like
getCompactionCandidates that list out the SSTables.
In our case this causes nodes to build up pending tasks and simply stop
responding to requests.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]