Greg Harris created KAFKA-14338:
-----------------------------------
Summary: Connect RetryUtilTest flakey in CPU-limited environments
Key: KAFKA-14338
URL: https://issues.apache.org/jira/browse/KAFKA-14338
Project: Kafka
Issue Type: Test
Components: KafkaConnect
Reporter: Greg Harris
Assignee: Greg Harris
the RetryUtilTest added alongside the RetryUtil in
[https://github.com/apache/kafka/pull/11797] has some unresolved flakiness
issues in CPU restricted environments. I was able to reproduce two flakey
failures with a 2% CPU throttle in place:
{noformat}
1) testExhaustingRetries(org.apache.kafka.connect.util.RetryUtilTest)
org.junit.runners.model.TestTimedOutException: test timed out after 1000
milliseconds
at org.junit.internal.runners.MethodRoadie$1.run(MethodRoadie.java:78)
at
org.junit.internal.runners.MethodRoadie.runBeforesThenTestThenAfters(MethodRoadie.java:97)
at
org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl$PowerMockJUnit44MethodRunner.executeTest(PowerMockJUnit44RunnerDelegateImpl.java:310)
at
org.powermock.modules.junit4.internal.impl.PowerMockJUnit47RunnerDelegateImpl$PowerMockJUnit47MethodRunner.executeTestInSuper(PowerMockJUnit47RunnerDelegateImpl.java:131)
at
org.powermock.modules.junit4.internal.impl.PowerMockJUnit47RunnerDelegateImpl$PowerMockJUnit47MethodRunner.access$100(PowerMockJUnit47RunnerDelegateImpl.java:59)
at
org.powermock.modules.junit4.internal.impl.PowerMockJUnit47RunnerDelegateImpl$PowerMockJUnit47MethodRunner$TestExecutorStatement.evaluate(PowerMockJUnit47RunnerDelegateImpl.java:147)
at
org.powermock.modules.junit4.internal.impl.PowerMockJUnit47RunnerDelegateImpl$PowerMockJUnit47MethodRunner.evaluateStatement(PowerMockJUnit47RunnerDelegateImpl.java:107)
at
org.powermock.modules.junit4.internal.impl.PowerMockJUnit47RunnerDelegateImpl$PowerMockJUnit47MethodRunner.executeTest(PowerMockJUnit47RunnerDelegateImpl.java:82)
at
org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl$PowerMockJUnit44MethodRunner.runBeforesThenTestThenAfters(PowerMockJUnit44RunnerDelegateImpl.java:298)
at
org.junit.internal.runners.MethodRoadie.runWithTimeout(MethodRoadie.java:58)
at org.junit.internal.runners.MethodRoadie.run(MethodRoadie.java:48)
at
org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl.invokeTestMethod(PowerMockJUnit44RunnerDelegateImpl.java:218)
at
org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl.runMethods(PowerMockJUnit44RunnerDelegateImpl.java:160)
at
org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl$1.run(PowerMockJUnit44RunnerDelegateImpl.java:134)
at
org.junit.internal.runners.ClassRoadie.runUnprotected(ClassRoadie.java:34)
at
org.junit.internal.runners.ClassRoadie.runProtected(ClassRoadie.java:44)
at
org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl.run(PowerMockJUnit44RunnerDelegateImpl.java:136)
at
org.powermock.modules.junit4.common.internal.impl.JUnit4TestSuiteChunkerImpl.run(JUnit4TestSuiteChunkerImpl.java:117)
at
org.powermock.modules.junit4.common.internal.impl.AbstractCommonPowerMockRunner.run(AbstractCommonPowerMockRunner.java:57)
at
org.powermock.modules.junit4.PowerMockRunner.run(PowerMockRunner.java:59)
at org.junit.runners.Suite.runChild(Suite.java:128)
at org.junit.runners.Suite.runChild(Suite.java:27)
at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
at org.junit.runner.JUnitCore.runMain(JUnitCore.java:77)
at org.junit.runner.JUnitCore.main(JUnitCore.java:36)
2) retriesEventuallySucceed(org.apache.kafka.connect.util.RetryUtilTest)
org.apache.kafka.connect.errors.ConnectException: Fail to Test after 1
attempts. Reason: null
at
org.apache.kafka.connect.util.RetryUtil.retryUntilTimeout(RetryUtil.java:101)
at
org.apache.kafka.connect.util.RetryUtilTest.retriesEventuallySucceed(RetryUtilTest.java:74)
... 38 trimmed
Caused by: org.apache.kafka.common.errors.TimeoutException
{noformat}
Rather than relying on flat timeouts, the test should be written such that
deadlocks are impossible and the retries proceed deterministically.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)