[ 
https://issues.apache.org/jira/browse/KAFKA-17515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882672#comment-17882672
 ] 

Yu-Lin Chen commented on KAFKA-17515:
-------------------------------------

The another flaky was caused by long initialize of stream tasks. It took longer 
than 60 secs so no record was processed.  ([flaky 
link|https://ge.apache.org/s/kzmyf3c3jo7fo/tests/task/:streams:test/details/org.apache.kafka.streams.integration.RestoreIntegrationTest/shouldInvokeUserDefinedGlobalStateRestoreListener()?top-execution=1])
{code:java}
org.opentest4j.AssertionFailedError: Condition not met within timeout 60000. 
Did not receive all [KeyValue(0, 0), KeyValue(1, 1), KeyValue(2, 2), 
KeyValue(3, 3), KeyValue(4, 4), KeyValue(5, 5), KeyValue(6, 6), KeyValue(7, 7), 
KeyValue(8, 8), KeyValue(9, 9), KeyValue(10, 10), KeyValue(11, 11), 
KeyValue(12, 12), KeyValue(13, 13), KeyValue(14, 14), KeyValue(15, 15), 
KeyValue(16, 16), KeyValue(17, 17), KeyValue(18, 18), KeyValue(19, 19), 
KeyValue(20, 20), KeyValue(21, 21), KeyValue(22, 22), KeyValue(23, 23), 
KeyValue(24, 24), KeyValue(25, 25), KeyValue(26, 26), KeyValue(27, 27), 
KeyValue(28, 28), KeyValue(29, 29), KeyValue(30, 30), KeyValue(31, 31), 
KeyValue(32, 32), KeyValue(33, 33), KeyValue(34, 34), KeyValue(35, 35), 
KeyValue(36, 36), KeyValue(37, 37), KeyValue(38, 38), KeyValue(39, 39), 
KeyValue(40, 40), KeyValue(41, 41), KeyValue(42, 42), KeyValue(43, 43), 
KeyValue(44, 44), KeyValue(45, 45), KeyValue(46, 46), KeyValue(47, 47), 
KeyValue(48, 48), KeyValue(49, 49), KeyValue(50, 50), KeyValue(51, 51), 
KeyValue(52, 52), KeyValue(53, 53), KeyValue(54, 54), KeyValue(55, 55), 
KeyValue(56, 56), KeyValue(57, 57), KeyValue(58, 58), KeyValue(59, 59), 
KeyValue(60, 60), KeyValue(61, 61), KeyValue(62, 62), KeyValue(63, 63), 
KeyValue(64, 64), KeyValue(65, 65), KeyValue(66, 66), KeyValue(67, 67), 
KeyValue(68, 68), KeyValue(69, 69), KeyValue(70, 70), KeyValue(71, 71), 
KeyValue(72, 72), KeyValue(73, 73), KeyValue(74, 74), KeyValue(75, 75), 
KeyValue(76, 76), KeyValue(77, 77), KeyValue(78, 78), KeyValue(79, 79), 
KeyValue(80, 80), KeyValue(81, 81), KeyValue(82, 82), KeyValue(83, 83), 
KeyValue(84, 84), KeyValue(85, 85), KeyValue(86, 86), KeyValue(87, 87), 
KeyValue(88, 88), KeyValue(89, 89), KeyValue(90, 90), KeyValue(91, 91), 
KeyValue(92, 92), KeyValue(93, 93), KeyValue(94, 94), KeyValue(95, 95), 
KeyValue(96, 96), KeyValue(97, 97), KeyValue(98, 98), KeyValue(99, 99)] records 
from topic outputTopic (got []) ==> expected: <true> but was: <false> {code}
 

 

Logs:
{code:java}
[2024-09-12 18:50:29,644] INFO ... ks1-StreamThread-1] State transition from 
CREATED to STARTING
[2024-09-12 18:50:38,263] INFO ... ks1-StreamThread-1] State transition from 
STARTING to PARTITIONS_ASSIGNED
[2024-09-12 18:50:48,813] INFO ... ks1-StreamThread-1] task [0_3] Initialized 
(org.apache.kafka.streams.processor.internals.StreamTask:277)
[2024-09-12 18:51:02,038] INFO ... ks1-StreamThread-1] task [0_0] Initialized 
(org.apache.kafka.streams.processor.internals.StreamTask:277)
[2024-09-12 18:51:14,316] INFO ... ks1-StreamThread-1] task [0_4] Initialized 
(org.apache.kafka.streams.processor.internals.StreamTask:277)
[2024-09-12 18:51:25,080] INFO ... ks1-StreamThread-1] task [0_1] Initialized 
(org.apache.kafka.streams.processor.internals.StreamTask:277)
[2024-09-12 18:51:37,862] INFO ... ks1-StreamThread-1] task [0_2] Initialized 
(org.apache.kafka.streams.processor.internals.StreamTask:277) {code}
In the flaky test run, each step takes the following amount of time for task 
initialization:
 # 8.619 sec for rebalancing
 # 10.550 sec for initializing task [0_3]
 # 13.225 sec for initializing task [0_0]
 # 12.278 sec for initializing task [0_4]
 # 10.764 sec for initializing task [0_1]
 # 12.782 sec for initializing task [0_2]

It took longer than 60 seconds. The followings are the tasks initialization 
time in other flaky test:
 # Failed Execution 
([Link|https://ge.apache.org/s/kzmyf3c3jo7fo/tests/task/:streams:test/details/org.apache.kafka.streams.integration.RestoreIntegrationTest/shouldInvokeUserDefinedGlobalStateRestoreListener()?top-execution=1]):
  68.218 sec (Started Sep 13 2024 at 02:50:08 CST)
 # Successed Execution 
([Link|https://ge.apache.org/s/kzmyf3c3jo7fo/tests/task/:streams:test/details/org.apache.kafka.streams.integration.RestoreIntegrationTest/shouldInvokeUserDefinedGlobalStateRestoreListener()?top-execution=2])
 6.718 sec (Started Sep 13 2024 at 05:40:21 CST)
 # Failed Execution 
([Link|https://ge.apache.org/s/g72utse4pdln2/tests/task/:streams:test/details/org.apache.kafka.streams.integration.RestoreIntegrationTest/shouldInvokeUserDefinedGlobalStateRestoreListener()?top-execution=1]):
  59.632 sec (Started Sep 14 2024 at 04:05:41 CST)
 # Passed Execution 
([Link|https://ge.apache.org/s/g72utse4pdln2/tests/task/:streams:test/details/org.apache.kafka.streams.integration.RestoreIntegrationTest/shouldInvokeUserDefinedGlobalStateRestoreListener()?top-execution=2]):
  7.891(Started Sep 14 2024 at 07:36:26 CST)

This issue doesn't occurs in the last three days, so I think we could fix the 
known issue in the PR first.
Keep moniotoring the flaky rate in the next following days to see whether we 
should loose the timeout again.

> Fix flaky 
> RestoreIntegrationTest.shouldInvokeUserDefinedGlobalStateRestoreListener
> ----------------------------------------------------------------------------------
>
>                 Key: KAFKA-17515
>                 URL: https://issues.apache.org/jira/browse/KAFKA-17515
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams, unit tests
>            Reporter: Chia-Ping Tsai
>            Assignee: Yu-Lin Chen
>            Priority: Major
>         Attachments: Reproduced screenshoot in my env (Loop 7).png
>
>
> {code:java}
> Stacktrace
> java.nio.file.DirectoryNotEmptyException: 
> /tmp/shouldInvokeUserDefinedGlobalStateRestoreListenerH0u0n9foRY_peZu4FqeGHQ10111145955704739924-ks1/shouldInvokeUserDefinedGlobalStateRestoreListenerH0u0n9foRY_peZu4FqeGHQ/0_0
>       at 
> java.base/sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:289)
>       at 
> java.base/sun.nio.fs.AbstractFileSystemProvider.deleteIfExists(AbstractFileSystemProvider.java:109)
>       at java.base/java.nio.file.Files.deleteIfExists(Files.java:1191)
>       at 
> org.apache.kafka.common.utils.Utils$1.postVisitDirectory(Utils.java:898)
>       at 
> org.apache.kafka.common.utils.Utils$1.postVisitDirectory(Utils.java:870)
>       at java.base/java.nio.file.Files.walkFileTree(Files.java:2803)
>       at java.base/java.nio.file.Files.walkFileTree(Files.java:2857)
>       at org.apache.kafka.common.utils.Utils.delete(Utils.java:870)
>       at 
> org.apache.kafka.streams.integration.utils.IntegrationTestUtils.purgeLocalStreamsState(IntegrationTestUtils.java:266)
>       at 
> org.apache.kafka.streams.integration.utils.IntegrationTestUtils.purgeLocalStreamsState(IntegrationTestUtils.java:278)
>       at 
> org.apache.kafka.streams.integration.RestoreIntegrationTest.shouldInvokeUserDefinedGlobalStateRestoreListener(RestoreIntegrationTest.java:583)
>       at java.base/java.lang.reflect.Method.invoke(Method.java:580)
>       at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
>       at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to