regis le bretonnic created CASSANDRA-16638:
----------------------------------------------

             Summary: compactions/repairs hangs
                 Key: CASSANDRA-16638
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16638
             Project: Cassandra
          Issue Type: Bug
          Components: Consistency/Repair, Local/Compaction
            Reporter: regis le bretonnic


Hi

We meet an issue during repairs (but more probably compaction issue in fact) 
since we upgraded from 3.11.1 to 3.11.10.

We are using reaper, but the issue doesn't seem to come from it (according to 
[[email protected]] ). When the problem happens, repairs driven by 
reaper are blocked.

Basically reaper hangs with the message "All nodes are busy or have too many 
pending compactions for the remaining candidate segments." and indeed one node 
has a lot of compaction pending tasks :

 
{code:java}
$ nodetool compactionstats
pending tasks: 95
- mt_metrics.metric_32: 95 
{code}
Errors in log are :

 
{code:java}
WARN [CompactionExecutor:12909] 2021-04-28 08:59:51,241 
LeveledCompactionStrategy.java:144 - Could not acquire references for 
compacting SSTables [BigTableReader(path='/var/lib/cassandra/d
....
WARN [CompactionExecutor:12909] 2021-04-28 09:00:19,484 
LeveledCompactionStrategy.java:144 - Could not acquire references for 
compacting SSTables [BigTableReader(path='/var/lib/cassandra/d
....
WARN [CompactionExecutor:12908] 2021-04-28 09:00:51,241 
LeveledCompactionStrategy.java:144 - Could not acquire references for 
compacting SSTables [BigTableReader(path='/var/lib/cassandra/d
....
WARN [CompactionExecutor:12907] 2021-04-28 08:58:51,097 
LeveledCompactionStrategy.java:144 - Could not acquire references for 
compacting SSTables 
[BigTableReader(path='/var/lib/cassandra/data/mt_metrics/metric_32-23300de089c311e882a61bd0fd209f48/md-350757-big-Data.db'),
 
BigTableReader(path='/var/lib/cassandra/data/mt_metrics/metric_32-23300de089c311e882a61bd0fd209f48/md-350755-big-Data.db'),
 
BigTableReader(path='/var/lib/cassandra/data/mt_metrics/metric_32-23300de089c311e882a61bd0fd209f48/md-350738-big-Data.db'),
 
BigTableReader(path='/var/lib/cassandra/data/mt_metrics/metric_32-23300de089c311e882a61bd0fd209f48/md-350759-big-Data.db'),
 
BigTableReader(path='/var/lib/cassandra/data/mt_metrics/metric_32-23300de089c311e882a61bd0fd209f48/md-350761-big-Data.db'),
 
BigTableReader(path='/var/lib/cassandra/data/mt_metrics/metric_32-23300de089c311e882a61bd0fd209f48/md-350740-big-Data.db'),
 
BigTableReader(path='/var/lib/cassandra/data/mt_metrics/metric_32-23300de089c311e882a61bd0fd209f48/md-350751-big-Data.db'),
 BigTableReader(path='/var/lib/cassandra/data/mt_metrics/
.... 
{code}

The error happened several times in few weeks and up to now always concerns LCS 
tables.

a.dejanoski mentioned me https://issues.apache.org/jira/browse/CASSANDRA-15242 
but I have no trace of messages like "disk boundaries are out of date for 
keyspacename.tablename" or "Refreshing disk boundary cache for 
keyspacename.tablename".

The workaround is simple : just restart the node once it is identified. Pending 
compactions tasks rerun well.

We have the issue on 2 of our clusters on 3.11.10.
Does someone else met the issue ?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to