Matt Fleming created CASSANDRA-16668:
----------------------------------------
Summary: Intermittent failure of
SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest caused by race
condition when shrinking maximum pool size to zero
Key: CASSANDRA-16668
URL: https://issues.apache.org/jira/browse/CASSANDRA-16668
Project: Cassandra
Issue Type: Bug
Reporter: Matt Fleming
A difficult-to-hit race condition exists in
changingMaxWorkersMeetsConcurrencyGoalsTest when changing the maximum pool size
from 0 -> 4 which results in the test failing like so:
{{junit.framework.AssertionFailedError: Test tasks did not hit max concurrency
goal expected:<true> but was:<false>junit.framework.AssertionFailedError: Test
tasks did not hit max concurrency goal expected:<true> but was:<false> at
org.apache.cassandra.concurrent.SEPExecutorTest.assertMaxTaskConcurrency(SEPExecutorTest.java:198)
at
org.apache.cassandra.concurrent.SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest(SEPExecutorTest.java:132)}}
I can hit this issue maybe 2/3 times for every 100 invocations of the unit test.
The issue that causes the failure is that if tasks are still enqueued when the
maximum pool size is set to zero and if all of the SEPWorker threads enter the
STOP state before the pool size is bumped to 4, then no SEPWorker threads will
be spun up to service the task queue. This causes the above error.
Why don't we spin up SEPWorker threads when enqueing tasks? Because of the
guard logic in addTask:
[https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/concurrent/SEPExecutor.java#L113,L121]
In this scenario taskPermits will not be zero (because we have tasks on the
queue) so we never call {{maybeStartSpinningWorker()}}.
A trick to make this issue much easier to hit is to insert a
{{Thread.sleep(500)}} immediately after setting the pool size to zero. This has
the effect of guaranteeing that all SEPWorker threads will be STOP'd before
enqueueing more work.
Here's a fix that attempts to spin up an SEPWorker whenever we grow the number
of work permits:
https://github.com/mfleming/cassandra/commit/071516d29e41da9924af24e8002822d3c6af0e01
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]