[
https://issues.apache.org/jira/browse/CASSANDRA-16668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17344146#comment-17344146
]
Ekaterina Dimitrova commented on CASSANDRA-16668:
-------------------------------------------------
Jenkins [CI run | https://jenkins-cm4.apache.org/job/Cassandra-devbranch/776/]
> Intermittent failure of
> SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest caused by race
> condition when shrinking maximum pool size to zero
> -----------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-16668
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16668
> Project: Cassandra
> Issue Type: Bug
> Components: Local/Other
> Reporter: Matt Fleming
> Assignee: Matt Fleming
> Priority: Normal
> Fix For: 4.0-rc
>
>
> A difficult-to-hit race condition exists in
> changingMaxWorkersMeetsConcurrencyGoalsTest when changing the maximum pool
> size from 0 -> 4 which results in the test failing like so:
> {{junit.framework.AssertionFailedError: Test tasks did not hit max
> concurrency goal expected:<true> but
> was:<false>junit.framework.AssertionFailedError: Test tasks did not hit max
> concurrency goal expected:<true> but was:<false> at
> org.apache.cassandra.concurrent.SEPExecutorTest.assertMaxTaskConcurrency(SEPExecutorTest.java:198)
> at
> org.apache.cassandra.concurrent.SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest(SEPExecutorTest.java:132)}}
> I can hit this issue maybe 2/3 times for every 100 invocations of the unit
> test.
> The issue that causes the failure is that if tasks are still enqueued when
> the maximum pool size is set to zero and if all of the SEPWorker threads
> enter the STOP state before the pool size is bumped to 4, then no SEPWorker
> threads will be spun up to service the task queue. This causes the above
> error.
> Why don't we spin up SEPWorker threads when enqueing tasks? Because of the
> guard logic in addTask:
> [https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/concurrent/SEPExecutor.java#L113,L121]
> In this scenario taskPermits will not be zero (because we have tasks on the
> queue) so we never call {{maybeStartSpinningWorker()}}.
> A trick to make this issue much easier to hit is to insert a
> {{Thread.sleep(500)}} immediately after setting the pool size to zero. This
> has the effect of guaranteeing that all SEPWorker threads will be STOP'd
> before enqueueing more work.
> Here's a fix that attempts to spin up an SEPWorker whenever we grow the
> number of work permits:
> https://github.com/mfleming/cassandra/commit/071516d29e41da9924af24e8002822d3c6af0e01
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]