[ https://issues.apache.org/jira/browse/KAFKA-17371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17875170#comment-17875170 ]
Ao Li edited comment on KAFKA-17371 at 8/20/24 12:35 PM: --------------------------------------------------------- [~frankvicky] Thanks for taking care of this! Making both `DefaultTaskExecutor::currentTask` and `DefaultTaskExecutor::runOnce` synchronized is a quick solution. However, I'm not sure if this will introduce any performance regression. [~chia7712] I'm currently running a concurrency testing tool that reruns all Kafka tests with different thread schedules. This helped me to find many bugs. While I can reproduce these failures and (sometimes) identify the root cause, I don't have enough context to fix these issues since I am not familiar with the codebase. Please let me know if you want me to report these bugs through another channel. was (Author: JIRAUSER306156): [~frankvicky] Thanks for taking care of this! Making both `DefaultTaskExecutor::currentTask` and `DefaultTaskExecutor::runOnce` synchronized a quick solution. However, I'm not sure if this will introduce any performance regression. [~chia7712] I'm currently running a concurrency testing tool that reruns all Kafka tests with different thread schedules. This helped me to find many bugs. While I can reproduce these failures and (sometimes) identify the root cause, I don't have enough context to fix these issues since I am not familiar with the codebase. Please let me know if you want me to report these bugs through another channel. > Flaky test in DefaultTaskExecutorTest.shouldUnassignTaskWhenRequired > -------------------------------------------------------------------- > > Key: KAFKA-17371 > URL: https://issues.apache.org/jira/browse/KAFKA-17371 > Project: Kafka > Issue Type: Bug > Reporter: Ao Li > Assignee: TengYao Chi > Priority: Minor > > Please see this fork https://github.com/aoli-al/kafka/tree/KAFKA-251 for a > deterministic reproduction. > The test failed with > {code} > expected: not <null> > org.opentest4j.AssertionFailedError: expected: not <null> > at > org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:152) > at > org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132) > at org.junit.jupiter.api.AssertNotNull.failNull(AssertNotNull.java:49) > at > org.junit.jupiter.api.AssertNotNull.assertNotNull(AssertNotNull.java:35) > at > org.junit.jupiter.api.AssertNotNull.assertNotNull(AssertNotNull.java:30) > at org.junit.jupiter.api.Assertions.assertNotNull(Assertions.java:304) > at > org.apache.kafka.streams.processor.internals.tasks.DefaultTaskExecutorTest.shouldUnassignTaskWhenRequired(DefaultTaskExecutorTest.java:233) > at java.base/java.lang.reflect.Method.invoke(Method.java:580) > at java.base/java.util.ArrayList.forEach(ArrayList.java:1596) > at java.base/java.util.ArrayList.forEach(ArrayList.java:1596) > {code} > The root cause of the failure is that `currentTask = > taskManager.assignNextTask(DefaultTaskExecutor.this);` is not an atomic > operation. This means that calling `taskManager.assignNextTask` will unblock > the `verify(taskManager, > timeout(VERIFICATION_TIMEOUT)).assignNextTask(taskExecutor);` statement in > the test method. > If `assertNotNull(taskExecutor.currentTask());` is executed before the > assignment `currentTaks = [...]` the test will fail. -- This message was sent by Atlassian Jira (v8.20.10#820010)