[
https://issues.apache.org/jira/browse/CASSANDRA-19427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17821387#comment-17821387
]
Stefan Miklosovic commented on CASSANDRA-19427:
-----------------------------------------------
[CASSANDRA-19427-4.0|https://github.com/instaclustr/cassandra/tree/CASSANDRA-19427-4.0]
{noformat}
java11_pre-commit_tests
✓ j11_build 1m 40s
✓ j11_cqlsh-dtests-py2-no-vnodes 6m 0s
✓ j11_cqlsh-dtests-py2-with-vnodes 5m 16s
✓ j11_cqlsh_dtests_py3 5m 8s
✓ j11_cqlsh_dtests_py311 5m 16s
✓ j11_cqlsh_dtests_py311_vnode 5m 45s
✓ j11_cqlsh_dtests_py38 5m 31s
✓ j11_cqlsh_dtests_py38_vnode 5m 25s
✓ j11_cqlsh_dtests_py3_vnode 5m 23s
✓ j11_cqlshlib_tests 7m 11s
✓ j11_dtests 32m 56s
✓ j11_dtests_vnode 34m 37s
✓ j11_jvm_dtests 11m 59s
✕ j11_unit_tests 8m 6s
org.apache.cassandra.net.ConnectionTest testTimeout
org.apache.cassandra.cql3.MemtableSizeTest testTruncationReleasesLogSpace
java11_separate_tests
java8_pre-commit_tests
java8_separate_tests
{noformat}
[java11_pre-commit_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3932/workflows/155c855b-bfcb-4c9e-a940-2848966d6bb1]
[java11_separate_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3932/workflows/cd8a246e-5e62-48eb-bca3-7370d135b2dd]
[java8_pre-commit_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3932/workflows/ada174c4-2dc0-449a-9c88-162f69bdf7b5]
[java8_separate_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3932/workflows/f3e98051-a2d4-4da0-bbc3-1f900166ce96]
> Fix concurrent access of ClientWarn causing AIOBE for SELECT WHERE IN queries
> with multiple coordinator-local partitions
> ------------------------------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-19427
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19427
> Project: Cassandra
> Issue Type: Bug
> Components: Consistency/Coordination, Legacy/Local Write-Read Paths
> Reporter: Abe Ratnofsky
> Assignee: Abe Ratnofsky
> Priority: Normal
> Fix For: 3.11.x, 4.0.x, 4.1.x, 5.0.x, 5.x
>
> Time Spent: 50m
> Remaining Estimate: 0h
>
> On one of our clusters, we noticed rare but periodic
> ArrayIndexOutOfBoundsExceptions:
>
> {code:java}
> message="Uncaught exception on thread Thread[ReadStage-3,5,main]"
> exception="java.lang.RuntimeException:
> java.lang.ArrayIndexOutOfBoundsException
> at
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2579)
> at
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> at
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
> at
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134)
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:119)
> at
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: java.lang.ArrayIndexOutOfBoundsException"{code}
>
>
> The error was in a Runnable, so the stacktrace didn't directly indicate where
> the error was coming from. We enabled JFR to log the underlying exception
> that was thrown:
>
> {code:java}
> message="Uncaught exception on thread Thread[ReadStage-2,5,main]"
> exception="java.lang.RuntimeException:
> java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 0
> at
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2579)
> at
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> at
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
> at
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134)
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:119)
> at
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds
> for length 0
> at java.base/java.util.ArrayList.add(ArrayList.java:487)
> at java.base/java.util.ArrayList.add(ArrayList.java:499)
> at org.apache.cassandra.service.ClientWarn$State.add(ClientWarn.java:84)
> at
> org.apache.cassandra.service.ClientWarn$State.access$000(ClientWarn.java:77)
> at org.apache.cassandra.service.ClientWarn.warn(ClientWarn.java:51)
> at
> org.apache.cassandra.db.ReadCommand$1MetricRecording.onClose(ReadCommand.java:596)
> at
> org.apache.cassandra.db.transform.BasePartitions.runOnClose(BasePartitions.java:70)
> at org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:95)
> at
> org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:2260)
> at
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2575)
> ... 6 more"{code}
>
>
> An AIOBE on ArrayList.add(E) should only be possible when multiple threads
> attempt to call the method at the same time.
>
> This was seen while executing a SELECT WHERE IN query with multiple partition
> keys. This exception could happen when multiple local reads are dispatched by
> the coordinator in
> org.apache.cassandra.service.reads.AbstractReadExecutor#makeRequests. In this
> case, multiple local reads exceed the tombstone warning threshold, so
> multiple tombstone warnings are added to the same ClientWarn.State reference.
> Currently, org.apache.cassandra.service.ClientWarn.State#warnings is an
> ArrayList, which isn't safe for concurrent modification, causing the AIOBE to
> be thrown.
>
> I have a patch available for this, and I'm preparing it now. The patch is
> simple - it just changes
> org.apache.cassandra.service.ClientWarn.State#warnings to a thread-safe
> CopyOnWriteArrayList. I also have a jvm-dtest that demonstrates the issue but
> doesn't need to be merged - it shows how a SELECT WHERE IN query with local
> reads that add client warnings can add to the same ClientWarn.State from
> different threads. I'll push that in a separate branch just for demonstration
> purposes.
>
> Demonstration branch:
> [https://github.com/apache/cassandra/compare/trunk...aratno:cassandra:CASSANDRA-19427-aiobe-clientwarn-demo]
> Fix branch:
> [https://github.com/apache/cassandra/compare/trunk...aratno:cassandra:CASSANDRA-19427-aiobe-clientwarn-fix]
> (PR linked below)
>
> This appears to have been an issue since at least 3.11, that was the earliest
> release I checked.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]