regis le bretonnic created CASSANDRA-16638:
----------------------------------------------
Summary: compactions/repairs hangs
Key: CASSANDRA-16638
URL: https://issues.apache.org/jira/browse/CASSANDRA-16638
Project: Cassandra
Issue Type: Bug
Components: Consistency/Repair, Local/Compaction
Reporter: regis le bretonnic
Hi
We meet an issue during repairs (but more probably compaction issue in fact)
since we upgraded from 3.11.1 to 3.11.10.
We are using reaper, but the issue doesn't seem to come from it (according to
[[email protected]] ). When the problem happens, repairs driven by
reaper are blocked.
Basically reaper hangs with the message "All nodes are busy or have too many
pending compactions for the remaining candidate segments." and indeed one node
has a lot of compaction pending tasks :
{code:java}
$ nodetool compactionstats
pending tasks: 95
- mt_metrics.metric_32: 95
{code}
Errors in log are :
{code:java}
WARN [CompactionExecutor:12909] 2021-04-28 08:59:51,241
LeveledCompactionStrategy.java:144 - Could not acquire references for
compacting SSTables [BigTableReader(path='/var/lib/cassandra/d
....
WARN [CompactionExecutor:12909] 2021-04-28 09:00:19,484
LeveledCompactionStrategy.java:144 - Could not acquire references for
compacting SSTables [BigTableReader(path='/var/lib/cassandra/d
....
WARN [CompactionExecutor:12908] 2021-04-28 09:00:51,241
LeveledCompactionStrategy.java:144 - Could not acquire references for
compacting SSTables [BigTableReader(path='/var/lib/cassandra/d
....
WARN [CompactionExecutor:12907] 2021-04-28 08:58:51,097
LeveledCompactionStrategy.java:144 - Could not acquire references for
compacting SSTables
[BigTableReader(path='/var/lib/cassandra/data/mt_metrics/metric_32-23300de089c311e882a61bd0fd209f48/md-350757-big-Data.db'),
BigTableReader(path='/var/lib/cassandra/data/mt_metrics/metric_32-23300de089c311e882a61bd0fd209f48/md-350755-big-Data.db'),
BigTableReader(path='/var/lib/cassandra/data/mt_metrics/metric_32-23300de089c311e882a61bd0fd209f48/md-350738-big-Data.db'),
BigTableReader(path='/var/lib/cassandra/data/mt_metrics/metric_32-23300de089c311e882a61bd0fd209f48/md-350759-big-Data.db'),
BigTableReader(path='/var/lib/cassandra/data/mt_metrics/metric_32-23300de089c311e882a61bd0fd209f48/md-350761-big-Data.db'),
BigTableReader(path='/var/lib/cassandra/data/mt_metrics/metric_32-23300de089c311e882a61bd0fd209f48/md-350740-big-Data.db'),
BigTableReader(path='/var/lib/cassandra/data/mt_metrics/metric_32-23300de089c311e882a61bd0fd209f48/md-350751-big-Data.db'),
BigTableReader(path='/var/lib/cassandra/data/mt_metrics/
....
{code}
The error happened several times in few weeks and up to now always concerns LCS
tables.
a.dejanoski mentioned me https://issues.apache.org/jira/browse/CASSANDRA-15242
but I have no trace of messages like "disk boundaries are out of date for
keyspacename.tablename" or "Refreshing disk boundary cache for
keyspacename.tablename".
The workaround is simple : just restart the node once it is identified. Pending
compactions tasks rerun well.
We have the issue on 2 of our clusters on 3.11.10.
Does someone else met the issue ?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]