Re: Cassandra 4.0 hanging on restart

Paul Chandler Wed, 19 Jan 2022 05:10:36 -0800

Hi Bowen,

Thanks for the reply, these have been our normal shutdowns, so we do a nodetool 
drain before restarting the service, so I would have thought there should not 
be any commtlogs


However there is these messages for one commit log, But looks like it has 
finished quickly and correctly:

INFO  [main] 2022-01-19 10:08:22,811  CommitLog.java:173 - Replaying 
/var/lib/cassandra/commitlog/CommitLog-7-1642094921295.log
WARN  [main] 2022-01-19 10:08:22,839  CommitLogReplayer.java:305 - Origin of 2 
sstables is unknown or doesn't match the local node; commitLogIntervals for 
them were ignored
Repeated about 10 times
WARN  [main] 2022-01-19 10:08:22,842  CommitLogReplayer.java:305 - Origin of 3 
sstables is unknown or doesn't match the local node; commitLogIntervals for 
them were ignored
INFO  [main] 2022-01-19 10:08:22,853  CommitLogReader.java:256 - Finished 
reading /var/lib/cassandra/commitlog/CommitLog-7-1642094921295.log
INFO  [main] 2022-01-19 10:08:22,882  CommitLog.java:175 - Log replay complete, 
0 replayed mutations 

Thanks 

Paul

> On 19 Jan 2022, at 13:03, Bowen Song <bo...@bso.ng> wrote:
> 
> Nothing obvious from the logs you posted.
> 
> Generally speaking, replaying commit log is often the culprit when a node 
> takes a long time to start. I have seen many nodes with large memtable and 
> commit log size limit spending over half an hour replaying the commit log. I 
> usually do a "nodetool flush" before shutting down the node to help speed up 
> the start time if the shutdown was planned. There isn't much you can do about 
> unexpected shutdown, such as server crashes. When that happens, the only 
> reasonable thing to do is wait for the commit log replay to finish. You 
> should see log entries related to replaying commit logs if this is the case.
> 
> However, if you don't find any logs related to replaying commit logs, the 
> cause may be completely different.
> 
> 
> On 19/01/2022 11:54, Paul Chandler wrote:
>> Hi all,
>> 
>> We have upgraded a couple of clusters from 3.11.6, now we are having issues 
>> when we restart the nodes.
>> 
>> The node will either hang or take 10-30 minute to restart, these are the 
>> last messages we have in the system.log:
>> 
>> INFO  [NonPeriodicTasks:1] 2022-01-19 10:08:23,267  FileUtils.java:545 - 
>> Deleting file during startup: 
>> /var/lib/cassandra/data/system/table_estimates-176c39cdb93d33a5a2188eb06a56f66e/nb-184-big-Summary.db
>> INFO  [NonPeriodicTasks:1] 2022-01-19 10:08:23,268  LogTransaction.java:240 
>> - Unfinished transaction log, deleting 
>> /var/lib/cassandra/data/system/table_estimates-176c39cdb93d33a5a2188eb06a56f66e/nb-185-big-Data.db
>> INFO  [NonPeriodicTasks:1] 2022-01-19 10:08:23,268  FileUtils.java:545 - 
>> Deleting file during startup: 
>> /var/lib/cassandra/data/system/table_estimates-176c39cdb93d33a5a2188eb06a56f66e/nb-185-big-Summary.db
>> INFO  [NonPeriodicTasks:1] 2022-01-19 10:08:23,269  LogTransaction.java:240 
>> - Unfinished transaction log, deleting 
>> /var/lib/cassandra/data/system/table_estimates-176c39cdb93d33a5a2188eb06a56f66e/nb-186-big-Data.db
>> INFO  [NonPeriodicTasks:1] 2022-01-19 10:08:23,270  FileUtils.java:545 - 
>> Deleting file during startup: 
>> /var/lib/cassandra/data/system/table_estimates-176c39cdb93d33a5a2188eb06a56f66e/nb-186-big-Summary.db
>> INFO  [NonPeriodicTasks:1] 2022-01-19 10:08:23,272  LogTransaction.java:240 
>> - Unfinished transaction log, deleting 
>> /var/lib/cassandra/data/system/table_estimates-176c39cdb93d33a5a2188eb06a56f66e/nb_txn_unknowncompactiontype_bc501d00-790f-11ec-9f80-85
>> 8854746758.log
>> INFO  [MemtableFlushWriter:2] 2022-01-19 10:08:23,289  
>> LogTransaction.java:240 - Unfinished transaction log, deleting 
>> /var/lib/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/nb_txn_flush_bc52dc20-790f-11ec-9f80-858854746758.log
>> 
>> The debug log has messages from DiskBoundaryManager.java at the same time, 
>> then it just has the following messages:||
>> 
>> DEBUG [ScheduledTasks:1] 2022-01-19 10:28:09,430  SSLFactory.java:354 - 
>> Checking whether certificates have been updated []
>> DEBUG [ScheduledTasks:1] 2022-01-19 10:38:09,431  SSLFactory.java:354 - 
>> Checking whether certificates have been updated []
>> DEBUG [ScheduledTasks:1] 2022-01-19 10:48:09,431  SSLFactory.java:354 - 
>> Checking whether certificates have been updated []
>> DEBUG [ScheduledTasks:1] 2022-01-19 10:58:09,431  SSLFactory.java:354 - 
>> Checking whether certificates have been updated []
>> 
>> 
>> It seems to get worse after each restart, and then it gets to the state 
>> where it just hangs, then the only thing to do is to re bootstrap the node.
>> 
>> Once I had re bootstrapped all the nodes in the cluster, I thought the 
>> cluster was stable, but I have now got the case where the one of the nodes 
>> is hanging again.
>> 
>> Does anyone have an ideas what is causing the problems ?
>> 
>> 
>> Thanks
>> 
>> Paul Chandler
>>

Re: Cassandra 4.0 hanging on restart

Reply via email to