[
https://issues.apache.org/jira/browse/CASSANDRA-18644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Brandon Williams updated CASSANDRA-18644:
-----------------------------------------
Bug Category: Parent values: Correctness(12982)Level 1 values: Transient
Incorrect Response(12987)
Complexity: Normal
Component/s: Cluster/Schema
Discovered By: User Report
Fix Version/s: 3.11.x
Severity: Low
Status: Open (was: Triage Needed)
> ERROR: ArrayIndexOutOfBoundsException: -1 in MigrationStage thread when
> shutdown Cassandra-3.11.15 node
> -------------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-18644
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18644
> Project: Cassandra
> Issue Type: Bug
> Components: Cluster/Schema
> Reporter: Ke Han
> Priority: Normal
> Fix For: 3.11.x
>
> Attachments: cassandra.yaml, system.log
>
>
> h2. Symptoms
> I was migrating from Cassandra-3.11.15 to Cassandra-4.1.2(Single node), and I
> encountered the following error message when I drained and shutdown the
> 3.11.15 node.
> {code:java}
> INFO [MigrationStage:1] 2023-06-30 02:41:08,822 ColumnFamilyStore.java:432 -
> Initializing uuidff0deb8dbf894fb1b3f59c0eafb94467.xlxzwsuwi.dbqxbgocor
> INFO [MutationStage-4] 2023-06-30 02:41:11,041 ColumnFamilyStore.java:2250 -
> Truncating uuidff0deb8dbf894fb1b3f59c0eafb94467.xlxzwsuwi
> INFO [MutationStage-4] 2023-06-30 02:41:13,547 ColumnFamilyStore.java:2304 -
> Truncate of uuidff0deb8dbf894fb1b3f59c0eafb94467.xlxzwsuwi is complete
> INFO [Native-Transport-Requests-5] 2023-06-30 02:41:13,552
> MigrationManager.java:362 - Drop Keyspace
> 'uuidff0deb8dbf894fb1b3f59c0eafb94467'
> INFO [IndexSummaryManager:1] 2023-06-30 02:41:36,223
> IndexSummaryRedistribution.java:78 - Redistributing index summaries
> INFO [RMI TCP Connection(2)-127.0.0.1] 2023-06-30 02:41:46,637
> StorageService.java:1660 - DRAINING: starting drain process
> INFO [RMI TCP Connection(2)-127.0.0.1] 2023-06-30 02:41:46,641
> HintsService.java:210 - Paused hints dispatch
> INFO [RMI TCP Connection(2)-127.0.0.1] 2023-06-30 02:41:46,645
> Server.java:179 - Stop listening for CQL clients
> INFO [RMI TCP Connection(2)-127.0.0.1] 2023-06-30 02:41:46,645
> Gossiper.java:1720 - Announcing shutdown
> INFO [RMI TCP Connection(2)-127.0.0.1] 2023-06-30 02:41:46,648
> StorageService.java:2585 - Node /192.168.215.2 state jump to shutdown
> INFO [RMI TCP Connection(2)-127.0.0.1] 2023-06-30 02:41:48,652
> MessagingService.java:985 - Waiting for messaging service to quiesce
> INFO [ACCEPT-/192.168.215.2] 2023-06-30 02:41:48,655
> MessagingService.java:1346 - MessagingService has terminated the accept()
> thread
> INFO [RMI TCP Connection(2)-127.0.0.1] 2023-06-30 02:41:52,618
> HintsService.java:210 - Paused hints dispatch
> INFO [RMI TCP Connection(2)-127.0.0.1] 2023-06-30 02:41:56,835
> StorageService.java:1660 - DRAINED
> ERROR [MigrationStage:1] 2023-06-30 02:41:56,838 CassandraDaemon.java:244 -
> Exception in thread Thread[MigrationStage:1,5,main]
> java.lang.ArrayIndexOutOfBoundsException: -1
> at java.util.ArrayList.elementData(ArrayList.java:424)
> at java.util.ArrayList.get(ArrayList.java:437)
> at
> org.apache.cassandra.db.commitlog.AbstractCommitLogSegmentManager.forceRecycleAll(AbstractCommitLogSegmentManager.java:289)
> at
> org.apache.cassandra.db.commitlog.CommitLog.forceRecycleAllSegments(CommitLog.java:227)
> at org.apache.cassandra.config.Schema.dropKeyspace(Schema.java:663)
> at
> org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1428)
> at
> org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1413)
> at
> org.apache.cassandra.schema.SchemaKeyspace.mergeSchemaAndAnnounceVersion(SchemaKeyspace.java:1390)
> at
> org.apache.cassandra.service.MigrationManager$1.runMayThrow(MigrationManager.java:464)
> at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84)
> at java.lang.Thread.run(Thread.java:750)
> ERROR [Native-Transport-Requests-5] 2023-06-30 02:41:56,839
> QueryMessage.java:129 - Unexpected error during query
> java.lang.RuntimeException: java.util.concurrent.ExecutionException:
> java.lang.ArrayIndexOutOfBoundsException: -1
> at
> org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:442)
> at
> org.apache.cassandra.service.MigrationManager.announce(MigrationManager.java:429)
> at
> org.apache.cassandra.service.MigrationManager.announceKeyspaceDrop(MigrationManager.java:363)
> at
> org.apache.cassandra.cql3.statements.DropKeyspaceStatement.announceMigration(DropKeyspaceStatement.java:63)
> at
> org.apache.cassandra.cql3.statements.SchemaAlteringStatement.execute(SchemaAlteringStatement.java:123)
> at
> org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:243)
> at
> org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:274)
> at
> org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:259)
> at
> org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:116)
> at
> org.apache.cassandra.transport.Message$Dispatcher.processRequest(Message.java:688)
> at
> org.apache.cassandra.transport.Message$Dispatcher.lambda$channelRead0$0(Message.java:594)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:113)
> at java.lang.Thread.run(Thread.java:750)
> Caused by: java.util.concurrent.ExecutionException:
> java.lang.ArrayIndexOutOfBoundsException: -1
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> at
> org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:438)
> ... 14 common frames omitted
> Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
> at java.util.ArrayList.elementData(ArrayList.java:424)
> at java.util.ArrayList.get(ArrayList.java:437)
> at
> org.apache.cassandra.db.commitlog.AbstractCommitLogSegmentManager.forceRecycleAll(AbstractCommitLogSegmentManager.java:289)
> at
> org.apache.cassandra.db.commitlog.CommitLog.forceRecycleAllSegments(CommitLog.java:227)
> at org.apache.cassandra.config.Schema.dropKeyspace(Schema.java:663)
> at
> org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1428)
> at
> org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1413)
> at
> org.apache.cassandra.schema.SchemaKeyspace.mergeSchemaAndAnnounceVersion(SchemaKeyspace.java:1390)
> at
> org.apache.cassandra.service.MigrationManager$1.runMayThrow(MigrationManager.java:464)
> at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84)
> ... 1 common frames omitted
> ERROR [Native-Transport-Requests-5] 2023-06-30 02:41:56,841
> ErrorMessage.java:384 - Unexpected exception during request
> java.lang.RuntimeException: java.util.concurrent.ExecutionException:
> java.lang.ArrayIndexOutOfBoundsException: -1
> at
> org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:442)
> at
> org.apache.cassandra.service.MigrationManager.announce(MigrationManager.java:429)
> at
> org.apache.cassandra.service.MigrationManager.announceKeyspaceDrop(MigrationManager.java:363)
> at
> org.apache.cassandra.cql3.statements.DropKeyspaceStatement.announceMigration(DropKeyspaceStatement.java:63)
> at
> org.apache.cassandra.cql3.statements.SchemaAlteringStatement.execute(SchemaAlteringStatement.java:123)
> at
> org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:243)
> at
> org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:274)
> at
> org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:259)
> at
> org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:116)
> at
> org.apache.cassandra.transport.Message$Dispatcher.processRequest(Message.java:688)
> at
> org.apache.cassandra.transport.Message$Dispatcher.lambda$channelRead0$0(Message.java:594)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:113)
> at java.lang.Thread.run(Thread.java:750)
> Caused by: java.util.concurrent.ExecutionException:
> java.lang.ArrayIndexOutOfBoundsException: -1
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> at
> org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:438)
> ... 14 common frames omitted
> Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
> at java.util.ArrayList.elementData(ArrayList.java:424)
> at java.util.ArrayList.get(ArrayList.java:437)
> at
> org.apache.cassandra.db.commitlog.AbstractCommitLogSegmentManager.forceRecycleAll(AbstractCommitLogSegmentManager.java:289)
> at
> org.apache.cassandra.db.commitlog.CommitLog.forceRecycleAllSegments(CommitLog.java:227)
> at org.apache.cassandra.config.Schema.dropKeyspace(Schema.java:663)
> at
> org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1428)
> at
> org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1413)
> at
> org.apache.cassandra.schema.SchemaKeyspace.mergeSchemaAndAnnounceVersion(SchemaKeyspace.java:1390)
> at
> org.apache.cassandra.service.MigrationManager$1.runMayThrow(MigrationManager.java:464)
> at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84)
> ... 1 common frames omitted
> INFO [RMI TCP Connection(4)-127.0.0.1] 2023-06-30 02:41:59,326
> CassandraDaemon.java:576 - Cassandra shutting down... {code}
> The _java.lang.ArrayIndexOutOfBoundsException_ is thrown in MigrationStage
> thread.
> h2. Root Cause
> From my preliminary analysis, this bug is related to thread interleaving
> between
> # Thread trying to shutdown the Cassandra instance.
> # MigrationStage thread trying to recycle segments.
> if user invokes {_}bin/nodetool stopdaemon{_}, the activeSegments in
> AbstractCommitLogSegmentManager will be empty.
> {code:java}
> // AbstractCommitLogSegmentManager.java
> /** Active segments, containing unflushed data. The tail of this queue is the
> one we allocate writes to */private final
> ConcurrentLinkedQueue<CommitLogSegment> activeSegments = new
> ConcurrentLinkedQueue<>(); {code}
> At this time, if the migration thread tries to invoke the forceRecycleAll
> function, it throws _ArrayIndexOutOfBoundsException_ if there's no active
> segment.
> {code:java}
> // AbstractCommitLogSegmentManager.java
> void forceRecycleAll(Iterable<UUID> droppedCfs)
> {
> List<CommitLogSegment> segmentsToRecycle = new
> ArrayList<>(activeSegments);
> CommitLogSegment last = segmentsToRecycle.get(segmentsToRecycle.size() -
> 1);
> advanceAllocatingFrom(last); // wait for the commit log modifications
> last.waitForModifications(); {code}
> It seems that this ERROR won't impact the data integrity. But I am not sure
> whether it will cause some problems in the segment recycling process.
> I think this should be handled considering its ERROR level in system.log. It
> might confuse users whether their data is still in an integrity state and
> whether they can perform the migration from 3.x to 4.x.
> I have attached my full system.log and cassandra.yaml file.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]