Startup errors - 4.1.3

Joe Obernberger Wed, 30 Aug 2023 06:59:15 -0700

Hi all - I replaced a node in a 14 node cluster, and it rebuilt OK. Istarted to see a lot of timeout errors, and discovered one of the nodeshad this message constantly repeated:"waiting to acquire a permit to begin streaming" - so perhaps I hit thisbug:

https://www.mail-archive.com/commits@cassandra.apache.org/msg284709.html

I then restarted that node, but it gave a bunch of errors about"unexpected disk state: failed to read translation log"I deleted the corresponding files and got that node to come up, but nowwhen I restart any of the other nodes in the cluster, they too do notstart back up:


Example:

INFO [main] 2023-08-30 09:50:46,130 LogTransaction.java:544 - Verifyinglogfile transaction[nb_txn_stream_6bfe4220-43b9-11ee-9649-316c953ea746.log in/data/1/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3,/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3]ERROR [main] 2023-08-30 09:50:46,154 LogReplicaSet.java:145 - Mismatchedline in file nb_txn_stream_6bfe4220-43b9-11ee-9649-316c953ea746.log: got'ADD:[/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37640-big-,0,8][2833571752]'expected'ADD:[/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37639-big-,0,8][1997892352]',giving upERROR [main] 2023-08-30 09:50:46,155 LogFile.java:164 - Failed to readrecords for transaction log[nb_txn_stream_6bfe4220-43b9-11ee-9649-316c953ea746.log in/data/1/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3,/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3]ERROR [main] 2023-08-30 09:50:46,156 LogTransaction.java:559 -Unexpected disk state: failed to read transaction log[nb_txn_stream_6bfe4220-43b9-11ee-9649-316c953ea746.log in/data/1/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3,/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3]

Files and contents follow:
/data/1/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb_txn_stream_6bfe4220-43b9-11ee-9649-316c953ea746.log
ADD:[/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37639-big-,0,8][1997892352]
        ABORT:[,0,0][737437348]
ADD:[/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37640-big-,0,8][2833571752]
ADD:[/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37644-big-,0,8][3122518803]
ADD:[/data/1/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37643-big-,0,8][2875951075]
ADD:[/data/1/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37642-big-,0,8][884016253]
ADD:[/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37641-big-,0,8][926833718]
/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb_txn_stream_6bfe4220-43b9-11ee-9649-316c953ea746.log
ADD:[/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37640-big-,0,8][2833571752]

***Does not match<ADD:[/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37639-big-,0,8][1997892352]>in first replica file

ADD:[/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37644-big-,0,8][3122518803]
ADD:[/data/1/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37643-big-,0,8][2875951075]
ADD:[/data/1/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37642-big-,0,8][884016253]
ADD:[/data/4/cassandra/data/doc/extractedmetadata-25c210e0ada011ebade9fdc1d34336d3/nb-37641-big-,0,8][926833718]

ERROR [main] 2023-08-30 09:50:46,156 CassandraDaemon.java:897 - Cannotremove temporary or obsoleted files for doc.extractedmetadata due to aproblem with transaction log files. Please check records with problemsin the log messages above and fix them. Refer to the 3.0 upgradinginstructions in NEWS.txt for a description of transaction log files.

I then delete the files and eventually after many iterations, the nodecomes back up.The table 'extractedmetadata' has 29 billion records. Just a data pointhere - I think the 'right' thing to do is just to go to each node andstop it, clean up the files, and finally get each one back up?


-Joe


--
This email has been checked for viruses by AVG antivirus software.
www.avg.com

Startup errors - 4.1.3

Reply via email to