[ 
https://issues.apache.org/jira/browse/KAFKA-12657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17535843#comment-17535843
 ] 

Chris Egerton commented on KAFKA-12657:
---------------------------------------

I've reviewed the test runs and managed to reproduce locally by invoking 
{{{}./gradlew :connect:runtime:integrationTest{}}}. It appears that the issue 
is environmental; every failure occurs because some condition isn't met in 
time, and oftentimes, that condition is simply starting an embedded Connect 
worker at the beginning of the test.

I was able to achieve a completely green run by adding {{-PmaxParallelForks=5}} 
to my test run, which suggests that reducing the number of active test cases 
(or even just running threads in general) would alleviate the issue. 
Unfortunately, it looks like our Jenkinsfile [already sets that value even 
lower|https://github.com/apache/kafka/blob/85cfa70f59162d3b7ae23c55bb3f3fe97e56ba80/Jenkinsfile#L40]
 to just 2.

[~mjsax] [~cadonna] do you have any details about the CI architecture these 
tests are run on? Are the Jenkins nodes that carry out these tests shared 
concurrently by different Jenkins jobs/stages? How many cores are available on 
them? And, as a sanity check, are we still using the project's {{Jenkinsfile}} 
to run these tests (and set {{{}-PmaxParallelForks=2{}}})?

> Flaky Tests BlockingConnectorTest.testWorkerRestartWithBlockInConnectorStop
> ---------------------------------------------------------------------------
>
>                 Key: KAFKA-12657
>                 URL: https://issues.apache.org/jira/browse/KAFKA-12657
>             Project: Kafka
>          Issue Type: Test
>          Components: KafkaConnect
>            Reporter: Matthias J. Sax
>            Priority: Critical
>              Labels: flaky-test
>
> [https://github.com/apache/kafka/pull/10506/checks?check_run_id=2327377745]
> {quote} {{org.opentest4j.AssertionFailedError: Condition not met within 
> timeout 60000. Worker did not complete startup in time ==> expected: <true> 
> but was: <false>
>       at org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:55)
>       at org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:40)
>       at org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:193)
>       at 
> org.apache.kafka.test.TestUtils.lambda$waitForCondition$3(TestUtils.java:319)
>       at 
> org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:367)
>       at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:316)
>       at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:300)
>       at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:290)
>       at 
> org.apache.kafka.connect.integration.BlockingConnectorTest.setup(BlockingConnectorTest.java:133)}}
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to