Ke Han created CASSANDRA-18644:
----------------------------------

             Summary: ERROR: ArrayIndexOutOfBoundsException: -1 in 
MigrationStage thread when shutdown Cassandra-3.11.15 node
                 Key: CASSANDRA-18644
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18644
             Project: Cassandra
          Issue Type: Bug
            Reporter: Ke Han
         Attachments: cassandra.yaml, system.log

h2. Symptoms

I was migrating from Cassandra-3.11.15 to Cassandra-4.1.2(Single node), and I 
encountered the following error message when I drained and shutdown the 3.11.15 
node.

 
{code:java}
INFO  [MigrationStage:1] 2023-06-30 02:41:08,822 ColumnFamilyStore.java:432 - 
Initializing uuidff0deb8dbf894fb1b3f59c0eafb94467.xlxzwsuwi.dbqxbgocor
INFO  [MutationStage-4] 2023-06-30 02:41:11,041 ColumnFamilyStore.java:2250 - 
Truncating uuidff0deb8dbf894fb1b3f59c0eafb94467.xlxzwsuwi
INFO  [MutationStage-4] 2023-06-30 02:41:13,547 ColumnFamilyStore.java:2304 - 
Truncate of uuidff0deb8dbf894fb1b3f59c0eafb94467.xlxzwsuwi is complete
INFO  [Native-Transport-Requests-5] 2023-06-30 02:41:13,552 
MigrationManager.java:362 - Drop Keyspace 'uuidff0deb8dbf894fb1b3f59c0eafb94467'
INFO  [IndexSummaryManager:1] 2023-06-30 02:41:36,223 
IndexSummaryRedistribution.java:78 - Redistributing index summaries
INFO  [RMI TCP Connection(2)-127.0.0.1] 2023-06-30 02:41:46,637 
StorageService.java:1660 - DRAINING: starting drain process
INFO  [RMI TCP Connection(2)-127.0.0.1] 2023-06-30 02:41:46,641 
HintsService.java:210 - Paused hints dispatch
INFO  [RMI TCP Connection(2)-127.0.0.1] 2023-06-30 02:41:46,645 Server.java:179 
- Stop listening for CQL clients
INFO  [RMI TCP Connection(2)-127.0.0.1] 2023-06-30 02:41:46,645 
Gossiper.java:1720 - Announcing shutdown
INFO  [RMI TCP Connection(2)-127.0.0.1] 2023-06-30 02:41:46,648 
StorageService.java:2585 - Node /192.168.215.2 state jump to shutdown
INFO  [RMI TCP Connection(2)-127.0.0.1] 2023-06-30 02:41:48,652 
MessagingService.java:985 - Waiting for messaging service to quiesce
INFO  [ACCEPT-/192.168.215.2] 2023-06-30 02:41:48,655 
MessagingService.java:1346 - MessagingService has terminated the accept() thread
INFO  [RMI TCP Connection(2)-127.0.0.1] 2023-06-30 02:41:52,618 
HintsService.java:210 - Paused hints dispatch
INFO  [RMI TCP Connection(2)-127.0.0.1] 2023-06-30 02:41:56,835 
StorageService.java:1660 - DRAINED
ERROR [MigrationStage:1] 2023-06-30 02:41:56,838 CassandraDaemon.java:244 - 
Exception in thread Thread[MigrationStage:1,5,main]
java.lang.ArrayIndexOutOfBoundsException: -1
        at java.util.ArrayList.elementData(ArrayList.java:424)
        at java.util.ArrayList.get(ArrayList.java:437)
        at 
org.apache.cassandra.db.commitlog.AbstractCommitLogSegmentManager.forceRecycleAll(AbstractCommitLogSegmentManager.java:289)
        at 
org.apache.cassandra.db.commitlog.CommitLog.forceRecycleAllSegments(CommitLog.java:227)
        at org.apache.cassandra.config.Schema.dropKeyspace(Schema.java:663)
        at 
org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1428)
        at 
org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1413)
        at 
org.apache.cassandra.schema.SchemaKeyspace.mergeSchemaAndAnnounceVersion(SchemaKeyspace.java:1390)
        at 
org.apache.cassandra.service.MigrationManager$1.runMayThrow(MigrationManager.java:464)
        at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at 
org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84)
        at java.lang.Thread.run(Thread.java:750)
ERROR [Native-Transport-Requests-5] 2023-06-30 02:41:56,839 
QueryMessage.java:129 - Unexpected error during query
java.lang.RuntimeException: java.util.concurrent.ExecutionException: 
java.lang.ArrayIndexOutOfBoundsException: -1
        at 
org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:442)
        at 
org.apache.cassandra.service.MigrationManager.announce(MigrationManager.java:429)
        at 
org.apache.cassandra.service.MigrationManager.announceKeyspaceDrop(MigrationManager.java:363)
        at 
org.apache.cassandra.cql3.statements.DropKeyspaceStatement.announceMigration(DropKeyspaceStatement.java:63)
        at 
org.apache.cassandra.cql3.statements.SchemaAlteringStatement.execute(SchemaAlteringStatement.java:123)
        at 
org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:243)
        at 
org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:274)
        at 
org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:259)
        at 
org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:116)
        at 
org.apache.cassandra.transport.Message$Dispatcher.processRequest(Message.java:688)
        at 
org.apache.cassandra.transport.Message$Dispatcher.lambda$channelRead0$0(Message.java:594)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at 
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
        at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:113)
        at java.lang.Thread.run(Thread.java:750)
Caused by: java.util.concurrent.ExecutionException: 
java.lang.ArrayIndexOutOfBoundsException: -1
        at java.util.concurrent.FutureTask.report(FutureTask.java:122)
        at java.util.concurrent.FutureTask.get(FutureTask.java:192)
        at 
org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:438)
        ... 14 common frames omitted
Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
        at java.util.ArrayList.elementData(ArrayList.java:424)
        at java.util.ArrayList.get(ArrayList.java:437)
        at 
org.apache.cassandra.db.commitlog.AbstractCommitLogSegmentManager.forceRecycleAll(AbstractCommitLogSegmentManager.java:289)
        at 
org.apache.cassandra.db.commitlog.CommitLog.forceRecycleAllSegments(CommitLog.java:227)
        at org.apache.cassandra.config.Schema.dropKeyspace(Schema.java:663)
        at 
org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1428)
        at 
org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1413)
        at 
org.apache.cassandra.schema.SchemaKeyspace.mergeSchemaAndAnnounceVersion(SchemaKeyspace.java:1390)
        at 
org.apache.cassandra.service.MigrationManager$1.runMayThrow(MigrationManager.java:464)
        at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at 
org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84)
        ... 1 common frames omitted
ERROR [Native-Transport-Requests-5] 2023-06-30 02:41:56,841 
ErrorMessage.java:384 - Unexpected exception during request
java.lang.RuntimeException: java.util.concurrent.ExecutionException: 
java.lang.ArrayIndexOutOfBoundsException: -1
        at 
org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:442)
        at 
org.apache.cassandra.service.MigrationManager.announce(MigrationManager.java:429)
        at 
org.apache.cassandra.service.MigrationManager.announceKeyspaceDrop(MigrationManager.java:363)
        at 
org.apache.cassandra.cql3.statements.DropKeyspaceStatement.announceMigration(DropKeyspaceStatement.java:63)
        at 
org.apache.cassandra.cql3.statements.SchemaAlteringStatement.execute(SchemaAlteringStatement.java:123)
        at 
org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:243)
        at 
org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:274)
        at 
org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:259)
        at 
org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:116)
        at 
org.apache.cassandra.transport.Message$Dispatcher.processRequest(Message.java:688)
        at 
org.apache.cassandra.transport.Message$Dispatcher.lambda$channelRead0$0(Message.java:594)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at 
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
        at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:113)
        at java.lang.Thread.run(Thread.java:750)
Caused by: java.util.concurrent.ExecutionException: 
java.lang.ArrayIndexOutOfBoundsException: -1
        at java.util.concurrent.FutureTask.report(FutureTask.java:122)
        at java.util.concurrent.FutureTask.get(FutureTask.java:192)
        at 
org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:438)
        ... 14 common frames omitted
Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
        at java.util.ArrayList.elementData(ArrayList.java:424)
        at java.util.ArrayList.get(ArrayList.java:437)
        at 
org.apache.cassandra.db.commitlog.AbstractCommitLogSegmentManager.forceRecycleAll(AbstractCommitLogSegmentManager.java:289)
        at 
org.apache.cassandra.db.commitlog.CommitLog.forceRecycleAllSegments(CommitLog.java:227)
        at org.apache.cassandra.config.Schema.dropKeyspace(Schema.java:663)
        at 
org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1428)
        at 
org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1413)
        at 
org.apache.cassandra.schema.SchemaKeyspace.mergeSchemaAndAnnounceVersion(SchemaKeyspace.java:1390)
        at 
org.apache.cassandra.service.MigrationManager$1.runMayThrow(MigrationManager.java:464)
        at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at 
org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84)
        ... 1 common frames omitted
INFO  [RMI TCP Connection(4)-127.0.0.1] 2023-06-30 02:41:59,326 
CassandraDaemon.java:576 - Cassandra shutting down... {code}
 

The _java.lang.ArrayIndexOutOfBoundsException_ is thrown in MigrationStage 
thread.
h2. Root Cause

This bug is related to thread interleaving between
 # Thread trying to shutdown the Cassandra instance.
 # MigrationStage thread trying to recycle segments.

if user invokes {_}bin/nodetool stopdaemon{_}, the activeSegments in 
AbstractCommitLogSegmentManager will be empty.

 
{code:java}
// AbstractCommitLogSegmentManager.java
/** Active segments, containing unflushed data. The tail of this queue is the 
one we allocate writes to */private final 
ConcurrentLinkedQueue<CommitLogSegment> activeSegments = new 
ConcurrentLinkedQueue<>(); {code}
 

At this time, if the migration thread tries to invoke the forceRecycleAll 
function, it throws ArrayIndexOutOfBoundsException if there's no active segment.

 
{code:java}
// AbstractCommitLogSegmentManager.java
void forceRecycleAll(Iterable<UUID> droppedCfs)
{
    List<CommitLogSegment> segmentsToRecycle = new ArrayList<>(activeSegments);
    CommitLogSegment last = segmentsToRecycle.get(segmentsToRecycle.size() - 1);
    advanceAllocatingFrom(last);    // wait for the commit log modifications
    last.waitForModifications(); {code}
It seems that this ERROR won't impact the data integrity. But I am not sure 
whether it will cause some problems in the segment recycling process.
I think this should be handled considering its ERROR level. It might confuse 
users whether their data is still in an integrity state => whether they can 
perform the migration from 3.x to 4.x.

I have attached my full system.log and cassandra.yaml file.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to