[ https://issues.apache.org/jira/browse/CASSANDRA-19590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Klay updated CASSANDRA-19590: ----------------------------- Description: I am trying to upgrade from 2.2.19 to 3.0.30/3.11.17. I encountered the following exception during the upgrade process and the 3.0.30/3.11.17 node cannot start up. {code:java} ERROR [main] 2024-04-25 18:46:10,496 JVMStabilityInspector.java:124 - Exiting due to error while processing commit log during initialization. org.apache.cassandra.db.commitlog.CommitLogReadHandler$CommitLogReadException: Unexpected error deserializing mutation; saved to /tmp/mutation8318204837345269856dat. This may be caused by replaying a mutation against a table with the same name but incompatible schema. Exception follows: java.lang.AssertionError at org.apache.cassandra.db.commitlog.CommitLogReader.readMutation(CommitLogReader.java:471) at org.apache.cassandra.db.commitlog.CommitLogReader.readSection(CommitLogReader.java:404) at org.apache.cassandra.db.commitlog.CommitLogReader.readCommitLogSegment(CommitLogReader.java:251) at org.apache.cassandra.db.commitlog.CommitLogReader.readAllFiles(CommitLogReader.java:132) at org.apache.cassandra.db.commitlog.CommitLogReplayer.replayFiles(CommitLogReplayer.java:137) at org.apache.cassandra.db.commitlog.CommitLog.recoverFiles(CommitLog.java:189) at org.apache.cassandra.db.commitlog.CommitLog.recoverSegmentsOnDisk(CommitLog.java:170) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:331) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:630) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:791) {code} h1. Reproduce1 (Flush before upgrade) Upgrade fails when replaying the commit log. This can be reproduced deterministically by 1. Start up cassandra-2.2.19, singe node is enough (Using default configuration) 2. Execute the following commands in cqlsh {code:java} CREATE KEYSPACE ks WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 }; CREATE TABLE ks.tb (c1 INT, c0 INT, PRIMARY KEY (c1)); INSERT INTO ks.tb (c1, c0) VALUES (0, 0); ALTER TABLE ks.tb DROP c0 ; ALTER TABLE ks.tb ADD c0 set<INT> ; {code} 3. Stop the old version. {code:java} bin/nodetool -h ::FFFF:127.0.0.1 flush bin/nodetool -h ::FFFF:127.0.0.1 stopdaemon{code} 4. Copy the data and start up the new version node (3.0.30 or 3.11.17) Upgrade crashes with the following error {code:java} ERROR [main] 2024-04-25 18:46:10,496 JVMStabilityInspector.java:124 - Exiting due to error while processing commit log during initialization. org.apache.cassandra.db.commitlog.CommitLogReadHandler$CommitLogReadException: Unexpected error deserializing mutation; saved to /tmp/mutation8318204837345269856dat. This may be caused by replaying a mutation against a table with the same name but incompatible schema. Exception follows: java.lang.AssertionError at org.apache.cassandra.db.commitlog.CommitLogReader.readMutation(CommitLogReader.java:471) at org.apache.cassandra.db.commitlog.CommitLogReader.readSection(CommitLogReader.java:404) at org.apache.cassandra.db.commitlog.CommitLogReader.readCommitLogSegment(CommitLogReader.java:251) at org.apache.cassandra.db.commitlog.CommitLogReader.readAllFiles(CommitLogReader.java:132) at org.apache.cassandra.db.commitlog.CommitLogReplayer.replayFiles(CommitLogReplayer.java:137) at org.apache.cassandra.db.commitlog.CommitLog.recoverFiles(CommitLog.java:189) at org.apache.cassandra.db.commitlog.CommitLog.recoverSegmentsOnDisk(CommitLog.java:170) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:331) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:630) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:791){code} I have attached the system.log when starting up the 3.11.17 node. I also attached the data folder generated from the 2.2.19, start up 3.0.30 or 3.11.17 with this data folder can directly expose the error. h1. Reproduce2 (Drain before upgrade) If DRAIN before the upgrade, the upgrade can finish, but the read for the table will fail and generate the following exception. It can be reproduced upgrading from 2.2 to both 3.0.30 and 3.11.17 h2. 2.2.19 => 3.0.30 with DRAIN Execute the same command in Reproduce1 in 2.2.19, when stop 2.2.19 node, use drain {code:java} bin/nodetool -h ::FFFF:127.0.0.1 drain bin/nodetool -h ::FFFF:127.0.0.1 stopdaemon{code} Start up the new version node (3.0.30), perform a read operation {code:java} ➜ apache-cassandra-3.0.30 bin/cqlsh Connected to Test Cluster at 127.0.0.1:9042. [cqlsh 5.0.1 | Cassandra 3.0.30 | CQL spec 3.4.0 | Native protocol v4] Use HELP for help. cqlsh> SELECT * FROM ks.tb; ReadFailure: Error from server: code=1300 [Replica(s) failed to execute read] message="Operation failed - received 0 responses and 1 failures" info={'failures': 1, 'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'} {code} system log {code:java} ERROR [SharedPool-Worker-2] 2024-04-27 16:09:55,263 AbstractLocalAwareExecutorService.java:166 - Uncaught exception on thread Thread[SharedPool-Worker-2,10,main] java.lang.AssertionError: null at org.apache.cassandra.db.rows.BufferCell.<init>(BufferCell.java:48) at org.apache.cassandra.db.LegacyLayout$CellGrouper.addCell(LegacyLayout.java:1476) at org.apache.cassandra.db.LegacyLayout$CellGrouper.addAtom(LegacyLayout.java:1395) at org.apache.cassandra.db.UnfilteredDeserializer$OldFormatDeserializer$UnfilteredIterator.readRow(UnfilteredDeserializer.java:554) at org.apache.cassandra.db.UnfilteredDeserializer$OldFormatDeserializer$UnfilteredIterator.hasNext(UnfilteredDeserializer.java:510) at org.apache.cassandra.db.UnfilteredDeserializer$OldFormatDeserializer.hasNext(UnfilteredDeserializer.java:336) at org.apache.cassandra.db.columniterator.AbstractSSTableIterator.readStaticRow(AbstractSSTableIterator.java:148) at org.apache.cassandra.db.columniterator.AbstractSSTableIterator.<init>(AbstractSSTableIterator.java:104) at org.apache.cassandra.db.columniterator.SSTableIterator.<init>(SSTableIterator.java:51) at org.apache.cassandra.io.sstable.format.big.BigTableReader.iterator(BigTableReader.java:76) at org.apache.cassandra.io.sstable.format.big.BigTableScanner$KeyScanningIterator$1.initializeIterator(BigTableScanner.java:366) at org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.maybeInit(LazilyInitializedUnfilteredRowIterator.java:48) at org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.metadata(LazilyInitializedUnfilteredRowIterator.java:53) at org.apache.cassandra.db.transform.RTBoundValidator.applyToPartition(RTBoundValidator.java:60) at org.apache.cassandra.db.transform.RTBoundValidator.applyToPartition(RTBoundValidator.java:34) at org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:96) at org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:369) at org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:189) at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:158) at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) at org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$4.hasNext(UnfilteredPartitionIterators.java:220) at org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:92) at org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:292) at org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:187) at org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:180) at org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:176) at org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:76) at org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:347) at org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1914) at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2652) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162) at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134) at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109) at java.lang.Thread.run(Thread.java:750) {code} h2. 2.2.19 => 3.11.17 with DRAIN Execute the same command in Reproduce1 in 2.2.19, when stop 2.2.19 node, use drain {code:java} bin/nodetool -h ::FFFF:127.0.0.1 drain bin/nodetool -h ::FFFF:127.0.0.1 stopdaemon{code} Start up the new version node (3.11.17), perform a read operation {code:java} ➜ apache-cassandra-3.11.17 bin/cqlsh Connected to Test Cluster at 127.0.0.1:9042. [cqlsh 5.0.1 | Cassandra 3.11.17 | CQL spec 3.4.4 | Native protocol v4] Use HELP for help. cqlsh> SELECT * FROM ks.tb; ReadFailure: Error from server: code=1300 [Replica(s) failed to execute read] message="Operation failed - received 0 responses and 1 failures" info={'failures': 1, 'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'} {code} system log {code:java} ERROR [ReadStage-2] 2024-04-27 02:10:24,085 AbstractLocalAwareExecutorService.java:166 - Uncaught exception on thread Thread[ReadStage-2,10,main] java.lang.AssertionError: null at org.apache.cassandra.db.rows.BufferCell.<init>(BufferCell.java:44) at org.apache.cassandra.db.LegacyLayout$CellGrouper.addCell(LegacyLayout.java:1489) at org.apache.cassandra.db.LegacyLayout$CellGrouper.addAtom(LegacyLayout.java:1407) at org.apache.cassandra.db.UnfilteredDeserializer$OldFormatDeserializer$UnfilteredIterator.readRow(UnfilteredDeserializer.java:555) at org.apache.cassandra.db.UnfilteredDeserializer$OldFormatDeserializer$UnfilteredIterator.hasNext(UnfilteredDeserializer.java:511) at org.apache.cassandra.db.UnfilteredDeserializer$OldFormatDeserializer.hasNext(UnfilteredDeserializer.java:336) at org.apache.cassandra.db.columniterator.AbstractSSTableIterator.readStaticRow(AbstractSSTableIterator.java:177) at org.apache.cassandra.db.columniterator.AbstractSSTableIterator.<init>(AbstractSSTableIterator.java:113) at org.apache.cassandra.db.columniterator.SSTableIterator.<init>(SSTableIterator.java:49) at org.apache.cassandra.io.sstable.format.big.BigTableReader.iterator(BigTableReader.java:72) at org.apache.cassandra.io.sstable.format.big.BigTableScanner$KeyScanningIterator$1.initializeIterator(BigTableScanner.java:392) at org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.maybeInit(LazilyInitializedUnfilteredRowIterator.java:48) at org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.metadata(LazilyInitializedUnfilteredRowIterator.java:58) at org.apache.cassandra.db.transform.RTBoundValidator.applyToPartition(RTBoundValidator.java:60) at org.apache.cassandra.db.transform.RTBoundValidator.applyToPartition(RTBoundValidator.java:34) at org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:96) at org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:374) at org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:186) at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:155) at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) at org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$4.hasNext(UnfilteredPartitionIterators.java:233) at org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:92) at org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:305) at org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:187) at org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:180) at org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:176) at org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:76) at org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:360) at org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:2007) at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2773) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162) at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134) at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:113) at java.lang.Thread.run(Thread.java:750) {code} was: I am trying to upgrade from 2.2.19 to 3.0.30/3.11.17. I encountered the following exception during the upgrade process and the 3.0.30/3.11.17 node cannot start up. {code:java} ERROR [main] 2024-04-25 18:46:10,496 JVMStabilityInspector.java:124 - Exiting due to error while processing commit log during initialization. org.apache.cassandra.db.commitlog.CommitLogReadHandler$CommitLogReadException: Unexpected error deserializing mutation; saved to /tmp/mutation8318204837345269856dat. This may be caused by replaying a mutation against a table with the same name but incompatible schema. Exception follows: java.lang.AssertionError at org.apache.cassandra.db.commitlog.CommitLogReader.readMutation(CommitLogReader.java:471) at org.apache.cassandra.db.commitlog.CommitLogReader.readSection(CommitLogReader.java:404) at org.apache.cassandra.db.commitlog.CommitLogReader.readCommitLogSegment(CommitLogReader.java:251) at org.apache.cassandra.db.commitlog.CommitLogReader.readAllFiles(CommitLogReader.java:132) at org.apache.cassandra.db.commitlog.CommitLogReplayer.replayFiles(CommitLogReplayer.java:137) at org.apache.cassandra.db.commitlog.CommitLog.recoverFiles(CommitLog.java:189) at org.apache.cassandra.db.commitlog.CommitLog.recoverSegmentsOnDisk(CommitLog.java:170) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:331) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:630) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:791) {code} h1. Reproduce1 (Flush before upgrade) Upgrade fails when reply the commit log. This can be reproduced deterministically by 1. Start up cassandra-2.2.19, singe node is enough (Using default configuration) 2. Execute the following commands in cqlsh {code:java} CREATE KEYSPACE ks WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 }; CREATE TABLE ks.tb (c1 INT, c0 INT, PRIMARY KEY (c1)); INSERT INTO ks.tb (c1, c0) VALUES (0, 0); ALTER TABLE ks.tb DROP c0 ; ALTER TABLE ks.tb ADD c0 set<INT> ; {code} 3. Stop the old version. {code:java} bin/nodetool -h ::FFFF:127.0.0.1 flush bin/nodetool -h ::FFFF:127.0.0.1 stopdaemon{code} 4. Copy the data and start up the new version node (3.0.30 or 3.11.17) Upgrade crashes with the following error {code:java} ERROR [main] 2024-04-25 18:46:10,496 JVMStabilityInspector.java:124 - Exiting due to error while processing commit log during initialization. org.apache.cassandra.db.commitlog.CommitLogReadHandler$CommitLogReadException: Unexpected error deserializing mutation; saved to /tmp/mutation8318204837345269856dat. This may be caused by replaying a mutation against a table with the same name but incompatible schema. Exception follows: java.lang.AssertionError at org.apache.cassandra.db.commitlog.CommitLogReader.readMutation(CommitLogReader.java:471) at org.apache.cassandra.db.commitlog.CommitLogReader.readSection(CommitLogReader.java:404) at org.apache.cassandra.db.commitlog.CommitLogReader.readCommitLogSegment(CommitLogReader.java:251) at org.apache.cassandra.db.commitlog.CommitLogReader.readAllFiles(CommitLogReader.java:132) at org.apache.cassandra.db.commitlog.CommitLogReplayer.replayFiles(CommitLogReplayer.java:137) at org.apache.cassandra.db.commitlog.CommitLog.recoverFiles(CommitLog.java:189) at org.apache.cassandra.db.commitlog.CommitLog.recoverSegmentsOnDisk(CommitLog.java:170) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:331) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:630) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:791){code} I have attached the system.log when starting up the 3.11.17 node. I also attached the data folder generated from the 2.2.19, start up 3.0.30 or 3.11.17 with this data folder can directly expose the error. h1. Reproduce2 (Drain before upgrade) If DRAIN before the upgrade, the upgrade can finish, but the read for the table will fail and generate the following exception. It can be reproduced upgrading from 2.2 to both 3.0.30 and 3.11.17 h2. 2.2.19 => 3.0.30 with DRAIN Execute the same command in Reproduce1 in 2.2.19, when stop 2.2.19 node, use drain {code:java} bin/nodetool -h ::FFFF:127.0.0.1 drain bin/nodetool -h ::FFFF:127.0.0.1 stopdaemon{code} Start up the new version node (3.0.30), perform a read operation {code:java} ➜ apache-cassandra-3.0.30 bin/cqlsh Connected to Test Cluster at 127.0.0.1:9042. [cqlsh 5.0.1 | Cassandra 3.0.30 | CQL spec 3.4.0 | Native protocol v4] Use HELP for help. cqlsh> SELECT * FROM ks.tb; ReadFailure: Error from server: code=1300 [Replica(s) failed to execute read] message="Operation failed - received 0 responses and 1 failures" info={'failures': 1, 'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'} {code} system log {code:java} ERROR [SharedPool-Worker-2] 2024-04-27 16:09:55,263 AbstractLocalAwareExecutorService.java:166 - Uncaught exception on thread Thread[SharedPool-Worker-2,10,main] java.lang.AssertionError: null at org.apache.cassandra.db.rows.BufferCell.<init>(BufferCell.java:48) at org.apache.cassandra.db.LegacyLayout$CellGrouper.addCell(LegacyLayout.java:1476) at org.apache.cassandra.db.LegacyLayout$CellGrouper.addAtom(LegacyLayout.java:1395) at org.apache.cassandra.db.UnfilteredDeserializer$OldFormatDeserializer$UnfilteredIterator.readRow(UnfilteredDeserializer.java:554) at org.apache.cassandra.db.UnfilteredDeserializer$OldFormatDeserializer$UnfilteredIterator.hasNext(UnfilteredDeserializer.java:510) at org.apache.cassandra.db.UnfilteredDeserializer$OldFormatDeserializer.hasNext(UnfilteredDeserializer.java:336) at org.apache.cassandra.db.columniterator.AbstractSSTableIterator.readStaticRow(AbstractSSTableIterator.java:148) at org.apache.cassandra.db.columniterator.AbstractSSTableIterator.<init>(AbstractSSTableIterator.java:104) at org.apache.cassandra.db.columniterator.SSTableIterator.<init>(SSTableIterator.java:51) at org.apache.cassandra.io.sstable.format.big.BigTableReader.iterator(BigTableReader.java:76) at org.apache.cassandra.io.sstable.format.big.BigTableScanner$KeyScanningIterator$1.initializeIterator(BigTableScanner.java:366) at org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.maybeInit(LazilyInitializedUnfilteredRowIterator.java:48) at org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.metadata(LazilyInitializedUnfilteredRowIterator.java:53) at org.apache.cassandra.db.transform.RTBoundValidator.applyToPartition(RTBoundValidator.java:60) at org.apache.cassandra.db.transform.RTBoundValidator.applyToPartition(RTBoundValidator.java:34) at org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:96) at org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:369) at org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:189) at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:158) at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) at org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$4.hasNext(UnfilteredPartitionIterators.java:220) at org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:92) at org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:292) at org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:187) at org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:180) at org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:176) at org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:76) at org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:347) at org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1914) at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2652) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162) at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134) at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109) at java.lang.Thread.run(Thread.java:750) {code} h2. 2.2.19 => 3.11.17 with DRAIN Execute the same command in Reproduce1 in 2.2.19, when stop 2.2.19 node, use drain {code:java} bin/nodetool -h ::FFFF:127.0.0.1 drain bin/nodetool -h ::FFFF:127.0.0.1 stopdaemon{code} Start up the new version node (3.11.17), perform a read operation {code:java} ➜ apache-cassandra-3.11.17 bin/cqlsh Connected to Test Cluster at 127.0.0.1:9042. [cqlsh 5.0.1 | Cassandra 3.11.17 | CQL spec 3.4.4 | Native protocol v4] Use HELP for help. cqlsh> SELECT * FROM ks.tb; ReadFailure: Error from server: code=1300 [Replica(s) failed to execute read] message="Operation failed - received 0 responses and 1 failures" info={'failures': 1, 'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'} {code} system log {code:java} ERROR [ReadStage-2] 2024-04-27 02:10:24,085 AbstractLocalAwareExecutorService.java:166 - Uncaught exception on thread Thread[ReadStage-2,10,main] java.lang.AssertionError: null at org.apache.cassandra.db.rows.BufferCell.<init>(BufferCell.java:44) at org.apache.cassandra.db.LegacyLayout$CellGrouper.addCell(LegacyLayout.java:1489) at org.apache.cassandra.db.LegacyLayout$CellGrouper.addAtom(LegacyLayout.java:1407) at org.apache.cassandra.db.UnfilteredDeserializer$OldFormatDeserializer$UnfilteredIterator.readRow(UnfilteredDeserializer.java:555) at org.apache.cassandra.db.UnfilteredDeserializer$OldFormatDeserializer$UnfilteredIterator.hasNext(UnfilteredDeserializer.java:511) at org.apache.cassandra.db.UnfilteredDeserializer$OldFormatDeserializer.hasNext(UnfilteredDeserializer.java:336) at org.apache.cassandra.db.columniterator.AbstractSSTableIterator.readStaticRow(AbstractSSTableIterator.java:177) at org.apache.cassandra.db.columniterator.AbstractSSTableIterator.<init>(AbstractSSTableIterator.java:113) at org.apache.cassandra.db.columniterator.SSTableIterator.<init>(SSTableIterator.java:49) at org.apache.cassandra.io.sstable.format.big.BigTableReader.iterator(BigTableReader.java:72) at org.apache.cassandra.io.sstable.format.big.BigTableScanner$KeyScanningIterator$1.initializeIterator(BigTableScanner.java:392) at org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.maybeInit(LazilyInitializedUnfilteredRowIterator.java:48) at org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.metadata(LazilyInitializedUnfilteredRowIterator.java:58) at org.apache.cassandra.db.transform.RTBoundValidator.applyToPartition(RTBoundValidator.java:60) at org.apache.cassandra.db.transform.RTBoundValidator.applyToPartition(RTBoundValidator.java:34) at org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:96) at org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:374) at org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:186) at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:155) at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) at org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$4.hasNext(UnfilteredPartitionIterators.java:233) at org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:92) at org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:305) at org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:187) at org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:180) at org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:176) at org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:76) at org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:360) at org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:2007) at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2773) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162) at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134) at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:113) at java.lang.Thread.run(Thread.java:750) {code} > Unexpected error deserializing mutation when upgrade from 2.2.19 to > 3.0.30/3.11.17 > ---------------------------------------------------------------------------------- > > Key: CASSANDRA-19590 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19590 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log > Reporter: Klay > Priority: Normal > Attachments: data.tar.gz, system.log > > > I am trying to upgrade from 2.2.19 to 3.0.30/3.11.17. I encountered the > following exception during the upgrade process and the 3.0.30/3.11.17 node > cannot start up. > {code:java} > ERROR [main] 2024-04-25 18:46:10,496 JVMStabilityInspector.java:124 - Exiting > due to error while processing commit log during initialization. > org.apache.cassandra.db.commitlog.CommitLogReadHandler$CommitLogReadException: > Unexpected error deserializing mutation; saved to > /tmp/mutation8318204837345269856dat. This may be caused by replaying a > mutation against a table with the same name but incompatible schema. > Exception follows: java.lang.AssertionError > at > org.apache.cassandra.db.commitlog.CommitLogReader.readMutation(CommitLogReader.java:471) > at > org.apache.cassandra.db.commitlog.CommitLogReader.readSection(CommitLogReader.java:404) > at > org.apache.cassandra.db.commitlog.CommitLogReader.readCommitLogSegment(CommitLogReader.java:251) > at > org.apache.cassandra.db.commitlog.CommitLogReader.readAllFiles(CommitLogReader.java:132) > at > org.apache.cassandra.db.commitlog.CommitLogReplayer.replayFiles(CommitLogReplayer.java:137) > at > org.apache.cassandra.db.commitlog.CommitLog.recoverFiles(CommitLog.java:189) > at > org.apache.cassandra.db.commitlog.CommitLog.recoverSegmentsOnDisk(CommitLog.java:170) > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:331) > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:630) > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:791) > {code} > h1. Reproduce1 (Flush before upgrade) > Upgrade fails when replaying the commit log. > This can be reproduced deterministically by > 1. Start up cassandra-2.2.19, singe node is enough (Using default > configuration) > 2. Execute the following commands in cqlsh > {code:java} > CREATE KEYSPACE ks WITH REPLICATION = { 'class' : 'SimpleStrategy', > 'replication_factor' : 1 }; > CREATE TABLE ks.tb (c1 INT, c0 INT, PRIMARY KEY (c1)); > INSERT INTO ks.tb (c1, c0) VALUES (0, 0); > ALTER TABLE ks.tb DROP c0 ; > ALTER TABLE ks.tb ADD c0 set<INT> ; > {code} > 3. Stop the old version. > {code:java} > bin/nodetool -h ::FFFF:127.0.0.1 flush > bin/nodetool -h ::FFFF:127.0.0.1 stopdaemon{code} > 4. Copy the data and start up the new version node (3.0.30 or 3.11.17) > Upgrade crashes with the following error > {code:java} > ERROR [main] 2024-04-25 18:46:10,496 JVMStabilityInspector.java:124 - Exiting > due to error while processing commit log during initialization. > org.apache.cassandra.db.commitlog.CommitLogReadHandler$CommitLogReadException: > Unexpected error deserializing mutation; saved to > /tmp/mutation8318204837345269856dat. This may be caused by replaying a > mutation against a table with the same name but incompatible schema. > Exception follows: java.lang.AssertionError > at > org.apache.cassandra.db.commitlog.CommitLogReader.readMutation(CommitLogReader.java:471) > at > org.apache.cassandra.db.commitlog.CommitLogReader.readSection(CommitLogReader.java:404) > at > org.apache.cassandra.db.commitlog.CommitLogReader.readCommitLogSegment(CommitLogReader.java:251) > at > org.apache.cassandra.db.commitlog.CommitLogReader.readAllFiles(CommitLogReader.java:132) > at > org.apache.cassandra.db.commitlog.CommitLogReplayer.replayFiles(CommitLogReplayer.java:137) > at > org.apache.cassandra.db.commitlog.CommitLog.recoverFiles(CommitLog.java:189) > at > org.apache.cassandra.db.commitlog.CommitLog.recoverSegmentsOnDisk(CommitLog.java:170) > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:331) > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:630) > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:791){code} > I have attached the system.log when starting up the 3.11.17 node. > I also attached the data folder generated from the 2.2.19, start up 3.0.30 or > 3.11.17 with this data folder can directly expose the error. > h1. Reproduce2 (Drain before upgrade) > If DRAIN before the upgrade, the upgrade can finish, but the read for the > table will fail and generate the following exception. It can be reproduced > upgrading from 2.2 to both 3.0.30 and 3.11.17 > h2. 2.2.19 => 3.0.30 with DRAIN > Execute the same command in Reproduce1 in 2.2.19, when stop 2.2.19 node, use > drain > {code:java} > bin/nodetool -h ::FFFF:127.0.0.1 drain > bin/nodetool -h ::FFFF:127.0.0.1 stopdaemon{code} > Start up the new version node (3.0.30), perform a read operation > {code:java} > ➜ apache-cassandra-3.0.30 bin/cqlsh > Connected to Test Cluster at 127.0.0.1:9042. > [cqlsh 5.0.1 | Cassandra 3.0.30 | CQL spec 3.4.0 | Native protocol v4] > Use HELP for help. > cqlsh> SELECT * FROM ks.tb; > ReadFailure: Error from server: code=1300 [Replica(s) failed to execute read] > message="Operation failed - received 0 responses and 1 failures" > info={'failures': 1, 'received_responses': 0, 'required_responses': 1, > 'consistency': 'ONE'} {code} > system log > {code:java} > ERROR [SharedPool-Worker-2] 2024-04-27 16:09:55,263 > AbstractLocalAwareExecutorService.java:166 - Uncaught exception on thread > Thread[SharedPool-Worker-2,10,main] > java.lang.AssertionError: null > at org.apache.cassandra.db.rows.BufferCell.<init>(BufferCell.java:48) > at > org.apache.cassandra.db.LegacyLayout$CellGrouper.addCell(LegacyLayout.java:1476) > at > org.apache.cassandra.db.LegacyLayout$CellGrouper.addAtom(LegacyLayout.java:1395) > at > org.apache.cassandra.db.UnfilteredDeserializer$OldFormatDeserializer$UnfilteredIterator.readRow(UnfilteredDeserializer.java:554) > at > org.apache.cassandra.db.UnfilteredDeserializer$OldFormatDeserializer$UnfilteredIterator.hasNext(UnfilteredDeserializer.java:510) > at > org.apache.cassandra.db.UnfilteredDeserializer$OldFormatDeserializer.hasNext(UnfilteredDeserializer.java:336) > at > org.apache.cassandra.db.columniterator.AbstractSSTableIterator.readStaticRow(AbstractSSTableIterator.java:148) > at > org.apache.cassandra.db.columniterator.AbstractSSTableIterator.<init>(AbstractSSTableIterator.java:104) > at > org.apache.cassandra.db.columniterator.SSTableIterator.<init>(SSTableIterator.java:51) > at > org.apache.cassandra.io.sstable.format.big.BigTableReader.iterator(BigTableReader.java:76) > at > org.apache.cassandra.io.sstable.format.big.BigTableScanner$KeyScanningIterator$1.initializeIterator(BigTableScanner.java:366) > at > org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.maybeInit(LazilyInitializedUnfilteredRowIterator.java:48) > at > org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.metadata(LazilyInitializedUnfilteredRowIterator.java:53) > at > org.apache.cassandra.db.transform.RTBoundValidator.applyToPartition(RTBoundValidator.java:60) > at > org.apache.cassandra.db.transform.RTBoundValidator.applyToPartition(RTBoundValidator.java:34) > at > org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:96) > at > org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:369) > at > org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:189) > at > org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:158) > at > org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) > at > org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$4.hasNext(UnfilteredPartitionIterators.java:220) > at > org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:92) > at > org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:292) > at > org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:187) > at > org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:180) > at > org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:176) > at > org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:76) > at > org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:347) > at > org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1914) > at > org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2652) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at > org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162) > at > org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134) > at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109) > at java.lang.Thread.run(Thread.java:750) {code} > h2. 2.2.19 => 3.11.17 with DRAIN > Execute the same command in Reproduce1 in 2.2.19, when stop 2.2.19 node, use > drain > {code:java} > bin/nodetool -h ::FFFF:127.0.0.1 drain > bin/nodetool -h ::FFFF:127.0.0.1 stopdaemon{code} > Start up the new version node (3.11.17), perform a read operation > {code:java} > ➜ apache-cassandra-3.11.17 bin/cqlsh > Connected to Test Cluster at 127.0.0.1:9042. > [cqlsh 5.0.1 | Cassandra 3.11.17 | CQL spec 3.4.4 | Native protocol v4] > Use HELP for help. > cqlsh> SELECT * FROM ks.tb; > ReadFailure: Error from server: code=1300 [Replica(s) failed to execute read] > message="Operation failed - received 0 responses and 1 failures" > info={'failures': 1, 'received_responses': 0, 'required_responses': 1, > 'consistency': 'ONE'} {code} > system log > {code:java} > ERROR [ReadStage-2] 2024-04-27 02:10:24,085 > AbstractLocalAwareExecutorService.java:166 - Uncaught exception on thread > Thread[ReadStage-2,10,main] > java.lang.AssertionError: null > at org.apache.cassandra.db.rows.BufferCell.<init>(BufferCell.java:44) > at > org.apache.cassandra.db.LegacyLayout$CellGrouper.addCell(LegacyLayout.java:1489) > at > org.apache.cassandra.db.LegacyLayout$CellGrouper.addAtom(LegacyLayout.java:1407) > at > org.apache.cassandra.db.UnfilteredDeserializer$OldFormatDeserializer$UnfilteredIterator.readRow(UnfilteredDeserializer.java:555) > at > org.apache.cassandra.db.UnfilteredDeserializer$OldFormatDeserializer$UnfilteredIterator.hasNext(UnfilteredDeserializer.java:511) > at > org.apache.cassandra.db.UnfilteredDeserializer$OldFormatDeserializer.hasNext(UnfilteredDeserializer.java:336) > at > org.apache.cassandra.db.columniterator.AbstractSSTableIterator.readStaticRow(AbstractSSTableIterator.java:177) > at > org.apache.cassandra.db.columniterator.AbstractSSTableIterator.<init>(AbstractSSTableIterator.java:113) > at > org.apache.cassandra.db.columniterator.SSTableIterator.<init>(SSTableIterator.java:49) > at > org.apache.cassandra.io.sstable.format.big.BigTableReader.iterator(BigTableReader.java:72) > at > org.apache.cassandra.io.sstable.format.big.BigTableScanner$KeyScanningIterator$1.initializeIterator(BigTableScanner.java:392) > at > org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.maybeInit(LazilyInitializedUnfilteredRowIterator.java:48) > at > org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.metadata(LazilyInitializedUnfilteredRowIterator.java:58) > at > org.apache.cassandra.db.transform.RTBoundValidator.applyToPartition(RTBoundValidator.java:60) > at > org.apache.cassandra.db.transform.RTBoundValidator.applyToPartition(RTBoundValidator.java:34) > at > org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:96) > at > org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:374) > at > org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:186) > at > org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:155) > at > org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) > at > org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$4.hasNext(UnfilteredPartitionIterators.java:233) > at > org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:92) > at > org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:305) > at > org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:187) > at > org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:180) > at > org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:176) > at > org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:76) > at > org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:360) > at > org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:2007) > at > org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2773) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at > org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162) > at > org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134) > at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:113) > at java.lang.Thread.run(Thread.java:750) {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org