Abe Ratnofsky created CASSANDRA-19427:
-----------------------------------------

             Summary: Fix concurrent access of ClientWarn causing AIOBE for 
SELECT WHERE IN queries with multiple coordinator-local partitions
                 Key: CASSANDRA-19427
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19427
             Project: Cassandra
          Issue Type: Bug
          Components: Consistency/Coordination, Legacy/Local Write-Read Paths
            Reporter: Abe Ratnofsky
            Assignee: Abe Ratnofsky


On one of our clusters, we noticed rare but periodic 
ArrayIndexOutOfBoundsExceptions:

 
{code:java}
message="Uncaught exception on thread Thread[ReadStage-3,5,main]"
exception="java.lang.RuntimeException: java.lang.ArrayIndexOutOfBoundsException
at 
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2579)
at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at 
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
at 
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134)
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:119)
at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.ArrayIndexOutOfBoundsException"{code}
 

 

The error was in a Runnable, so the stacktrace didn't directly indicate where 
the error was coming from. We enabled JFR to log the underlying exception that 
was thrown:
 
{code:java}
message="Uncaught exception on thread Thread[ReadStage-2,5,main]" 
exception="java.lang.RuntimeException: 
java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 0
at 
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2579)
at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at 
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
at 
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134)
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:119)
at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for 
length 0
at java.base/java.util.ArrayList.add(ArrayList.java:487)
at java.base/java.util.ArrayList.add(ArrayList.java:499)
at org.apache.cassandra.service.ClientWarn$State.add(ClientWarn.java:84)
at org.apache.cassandra.service.ClientWarn$State.access$000(ClientWarn.java:77)
at org.apache.cassandra.service.ClientWarn.warn(ClientWarn.java:51)
at 
org.apache.cassandra.db.ReadCommand$1MetricRecording.onClose(ReadCommand.java:596)
at 
org.apache.cassandra.db.transform.BasePartitions.runOnClose(BasePartitions.java:70)
at org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:95)
at 
org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:2260)
at 
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2575)
... 6 more"{code}
 
 

An AIOBE on ArrayList.add(E) should only be possible when multiple threads 
attempt to call the method at the same time.

 

This was seen while executing a SELECT WHERE IN query with multiple partition 
keys. This exception could happen when multiple local reads are dispatched by 
the coordinator in 
org.apache.cassandra.service.reads.AbstractReadExecutor#makeRequests. In this 
case, multiple local reads exceed the tombstone warning threshold, so multiple 
tombstone warnings are added to the same ClientWarn.State reference.  
Currently, org.apache.cassandra.service.ClientWarn.State#warnings is an 
ArrayList, which isn't safe for concurrent modification, causing the AIOBE to 
be thrown.

 

I have a patch available for this, and I'm preparing it now. The patch is 
simple - it just changes org.apache.cassandra.service.ClientWarn.State#warnings 
to a thread-safe CopyOnWriteArrayList. I also have a jvm-dtest that 
demonstrates the issue but doesn't need to be merged - it shows how a SELECT 
WHERE IN query with local reads that add client warnings can add to the same 
ClientWarn.State from different threads. I'll push that in a separate branch 
just for demonstration purposes.

 

This appears to have been an issue since at least 3.11, that was the earliest 
release I checked.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to