[ https://issues.apache.org/jira/browse/KAFKA-17515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17881177#comment-17881177 ]
Yu-Lin Chen commented on KAFKA-17515: ------------------------------------- Another trivial finding regarding the flaky tests, both tests are timeout, but the CI shows different failure reasons. This is because: * If local state dir was purged before RocksDBStore.flush(), the test will failed directly. ([The first flaky link|https://ge.apache.org/s/havqcr7zu2tbk/tests/task/:streams:test/details/org.apache.kafka.streams.integration.RestoreIntegrationTest/shouldInvokeUserDefinedGlobalStateRestoreListener()?expanded-stacktrace=WyIwIl0&focused-execution=1&page=eyJvdXRwdXQiOnsiMCI6MSwiMSI6Mn19&top-execution=2#L177]) * If local state dir was purged before writing OffsetCheckpoint, the test only throw warning in logs, the CI will keep running. ([The second flaky link|https://ge.apache.org/s/hdpapdbvngcts/tests/task/:streams:test/details/org.apache.kafka.streams.integration.RestoreIntegrationTest/shouldInvokeUserDefinedGlobalStateRestoreListener()?focused-execution=1&top-execution=2#L196]) I'm not sure whether the timeout indirectly caused the slowly start of ks-1 tasks in flaky test #2. But we can fix the known issue first. > Fix flaky > RestoreIntegrationTest.shouldInvokeUserDefinedGlobalStateRestoreListener > ---------------------------------------------------------------------------------- > > Key: KAFKA-17515 > URL: https://issues.apache.org/jira/browse/KAFKA-17515 > Project: Kafka > Issue Type: Bug > Components: streams, unit tests > Reporter: Chia-Ping Tsai > Assignee: Chia-Ping Tsai > Priority: Major > > {code:java} > Stacktrace > java.nio.file.DirectoryNotEmptyException: > /tmp/shouldInvokeUserDefinedGlobalStateRestoreListenerH0u0n9foRY_peZu4FqeGHQ10111145955704739924-ks1/shouldInvokeUserDefinedGlobalStateRestoreListenerH0u0n9foRY_peZu4FqeGHQ/0_0 > at > java.base/sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:289) > at > java.base/sun.nio.fs.AbstractFileSystemProvider.deleteIfExists(AbstractFileSystemProvider.java:109) > at java.base/java.nio.file.Files.deleteIfExists(Files.java:1191) > at > org.apache.kafka.common.utils.Utils$1.postVisitDirectory(Utils.java:898) > at > org.apache.kafka.common.utils.Utils$1.postVisitDirectory(Utils.java:870) > at java.base/java.nio.file.Files.walkFileTree(Files.java:2803) > at java.base/java.nio.file.Files.walkFileTree(Files.java:2857) > at org.apache.kafka.common.utils.Utils.delete(Utils.java:870) > at > org.apache.kafka.streams.integration.utils.IntegrationTestUtils.purgeLocalStreamsState(IntegrationTestUtils.java:266) > at > org.apache.kafka.streams.integration.utils.IntegrationTestUtils.purgeLocalStreamsState(IntegrationTestUtils.java:278) > at > org.apache.kafka.streams.integration.RestoreIntegrationTest.shouldInvokeUserDefinedGlobalStateRestoreListener(RestoreIntegrationTest.java:583) > at java.base/java.lang.reflect.Method.invoke(Method.java:580) > at java.base/java.util.ArrayList.forEach(ArrayList.java:1596) > at java.base/java.util.ArrayList.forEach(ArrayList.java:1596) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)