[ https://issues.apache.org/jira/browse/KAFKA-17354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17939349#comment-17939349 ]
Ao Li commented on KAFKA-17354: ------------------------------- Yes, the bug still appears. Here is the log I got from the test failures: [2025-03-28 19:33:37,942] WARN Using an OS temp directory in the state.dir property can cause failures with writing the checkpoint file due to the fact that this directory can be cleared by the OS. Resolved state.dir: [/var/folders/76/3qcdfkw112xbkvbl6hvq0d6c0000gp/T/kafka-4724884686315533877] (org.apache.kafka.streams.processor.internals.StateDirectory:154) [2025-03-28 19:33:37,966] INFO stream-thread [stream-thread-test-87bf53a8-54f2-485f-a4b6-acdbec0a8b3d-StreamThread-1] Creating restore consumer client (org.apache.kafka.streams.processor.internals.StreamThread:392) [2025-03-28 19:33:37,989] INFO stream-thread [stream-thread-test-87bf53a8-54f2-485f-a4b6-acdbec0a8b3d-StreamThread-1] Creating consumer client (org.apache.kafka.streams.processor.internals.StreamThread:470) [2025-03-28 19:33:38,012] INFO stream-thread [stream-thread-test-87bf53a8-54f2-485f-a4b6-acdbec0a8b3d-StreamThread-1] Starting (org.apache.kafka.streams.processor.internals.StreamThread:686) [2025-03-28 19:33:38,014] INFO stream-thread [stream-thread-test-87bf53a8-54f2-485f-a4b6-acdbec0a8b3d-StreamThread-1] State transition from CREATED to STARTING (org.apache.kafka.streams.processor.internals.StreamThread:254) [2025-03-28 19:33:38,017] INFO stream-thread [stream-thread-test-87bf53a8-54f2-485f-a4b6-acdbec0a8b3d-StreamThread-1] Informed to shut down (org.apache.kafka.streams.processor.internals.StreamThread:1551) [2025-03-28 19:33:38,018] INFO stream-thread [stream-thread-test-87bf53a8-54f2-485f-a4b6-acdbec0a8b3d-StreamThread-1] State transition from STARTING to PENDING_SHUTDOWN (org.apache.kafka.streams.processor.internals.StreamThread:254) [2025-03-28 19:33:38,020] INFO stream-thread [stream-thread-test-87bf53a8-54f2-485f-a4b6-acdbec0a8b3d-StreamThread-1] Shutting down clean (org.apache.kafka.streams.processor.internals.StreamThread:1565) [2025-03-28 19:33:38,022] INFO stream-thread [stream-thread-test-87bf53a8-54f2-485f-a4b6-acdbec0a8b3d-StreamThread-1] Shutdown complete (org.apache.kafka.streams.processor.internals.TaskManager:1541) [2025-03-28 19:33:38,025] INFO stream-thread [stream-thread-test-87bf53a8-54f2-485f-a4b6-acdbec0a8b3d-StreamThread-1] State transition from PENDING_SHUTDOWN to DEAD (org.apache.kafka.streams.processor.internals.StreamThread:254) [2025-03-28 19:33:38,071] INFO stream-thread [stream-thread-test-87bf53a8-54f2-485f-a4b6-acdbec0a8b3d-StreamThread-1] Informed to shut down (org.apache.kafka.streams.processor.internals.StreamThread:1551) [2025-03-28 19:33:38,073] INFO stream-thread [stream-thread-test-87bf53a8-54f2-485f-a4b6-acdbec0a8b3d-StreamThread-1] Shutdown complete (org.apache.kafka.streams.processor.internals.TaskManager:1541) [2025-03-28 19:33:38,074] INFO stream-thread [stream-thread-test-87bf53a8-54f2-485f-a4b6-acdbec0a8b3d-StreamThread-1] Informed to shut down (org.apache.kafka.streams.processor.internals.StreamThread:1551) Bug found in iteration test shouldChangeStateAtStartClose() repetition 0 of 100, you may find detailed report and replay files in /Users/aoli/repos/kafka/streams/build/fray/fray-report java.lang.RuntimeException: State mismatch PENDING_SHUTDOWN different from STARTING java.lang.AssertionError: java.lang.RuntimeException: State mismatch PENDING_SHUTDOWN different from STARTING at org.pastalab.fray.junit.junit5.FrayExtension.afterEach(FrayExtension.kt:53) at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197) at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179) at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197) at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197) at java.base/java.util.stream.SliceOps$1$1.accept(SliceOps.java:200) at java.base/java.util.stream.Stream$1.tryAdvance(Stream.java:1469) at java.base/java.util.stream.ReferencePipeline.forEachWithCancel(ReferencePipeline.java:129) at java.base/java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:527) at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:513) at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151) at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174) at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596) at java.base/java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:276) at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1708) at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151) at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174) at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596) at java.base/java.util.ArrayList.forEach(ArrayList.java:1596) at java.base/java.util.ArrayList.forEach(ArrayList.java:1596) Caused by: java.lang.RuntimeException: State mismatch PENDING_SHUTDOWN different from STARTING at org.apache.kafka.streams.processor.internals.StreamThreadTest$StateListenerStub.onChange(StreamThreadTest.java:362) at org.apache.kafka.streams.processor.internals.StreamThread.setState(StreamThread.java:266) at org.apache.kafka.streams.processor.internals.StreamThread.completeShutdown(StreamThread.java:1604) at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:700) State mismatch PENDING_SHUTDOWN different from STARTING java.lang.RuntimeException: State mismatch PENDING_SHUTDOWN different from STARTING at org.apache.kafka.streams.processor.internals.StreamThreadTest$StateListenerStub.onChange(StreamThreadTest.java:362) at org.apache.kafka.streams.processor.internals.StreamThread.setState(StreamThread.java:266) at org.apache.kafka.streams.processor.internals.StreamThread.completeShutdown(StreamThread.java:1604) at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:700) Exception: java.lang.RuntimeException thrown from the UncaughtExceptionHandler in thread "stream-thread-test-87bf53a8-54f2-485f-a4b6-acdbec0a8b3d-StreamThread-1" > StreamThread::setState race condition causes java.lang.RuntimeException: > State mismatch PENDING_SHUTDOWN different from STARTING > -------------------------------------------------------------------------------------------------------------------------------- > > Key: KAFKA-17354 > URL: https://issues.apache.org/jira/browse/KAFKA-17354 > Project: Kafka > Issue Type: Bug > Components: streams > Reporter: Ao Li > Assignee: Anton Liauchuk > Priority: Major > > I saw a test failure in `StreamThreadTest::shouldChangeStateAtStartClose`. A > race condition in `setState` causes an uncaught exception thrown in > `StateListenerStub`. > Basically, the function `setState` allows two threads to call > `stateListener.onChange` concurrently. > This patch will help you to reproduce the failure deterministically. > https://github.com/aoli-al/kafka/commit/033a9a33766740e6843effb9beabfdcb3804846b -- This message was sent by Atlassian Jira (v8.20.10#820010)