[ https://issues.apache.org/jira/browse/KAFKA-16225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17853823#comment-17853823 ]
Greg Harris commented on KAFKA-16225: ------------------------------------- This PR: [https://github.com/apache/kafka/pull/15335] for KAFKA-16234 appears to have substantially decreased the flakiness in this test suite: !Screenshot 2024-06-10 at 2.39.54 PM.png! Thanks [~omnia_h_ibrahim] and all of the reviewers on that fix! Some minor flakiness remains with the following 4 errors in the past 28 days on trunk: {noformat} org.opentest4j.AssertionFailedError: Timeout waiting for controller metadata propagating to brokers at org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:38)at org.junit.jupiter.api.Assertions.fail(Assertions.java:138) at kafka.utils.TestUtils$.ensureConsistentKRaftMetadata(TestUtils.scala:927) at kafka.integration.KafkaServerTestHarness.ensureConsistentKRaftMetadata(KafkaServerTestHarness.scala:422) at kafka.server.LogDirFailureTest.setUp(LogDirFailureTest.scala:60){noformat} {noformat} org.opentest4j.AssertionFailedError: Consumed 0 records before timeout instead of the expected 1 records at org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:38) at org.junit.jupiter.api.Assertions.fail(Assertions.java:138) at kafka.utils.TestUtils$.pollUntilAtLeastNumRecords(TestUtils.scala:927) at kafka.server.LogDirFailureTest.testProduceAfterLogDirFailureOnLeader(LogDirFailureTest.scala:200) at kafka.server.LogDirFailureTest.testIOExceptionDuringLogRoll(LogDirFailureTest.scala:72) {noformat} {noformat} java.nio.file.FileAlreadyExistsException: /home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/core/data/kafka-1608887280612041858 at sun.nio.fs.UnixException.translateToIOException(UnixException.java:88) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214) at java.nio.file.Files.newByteChannel(Files.java:361) at java.nio.file.Files.createFile(Files.java:632) at kafka.utils.TestUtils$.causeLogDirFailure(TestUtils.scala:1360) at kafka.server.LogDirFailureTest.testProduceErrorsFromLogDirFailureOnLeader(LogDirFailureTest.scala:191) at kafka.server.LogDirFailureTest.testProduceErrorFromFailureOnCheckpoint(LogDirFailureTest.scala:137){noformat} {noformat} java.nio.file.FileAlreadyExistsException: /home/jenkins/workspace/Kafka_kafka_trunk/core/data/kafka-13669229279446172937 at sun.nio.fs.UnixException.translateToIOException(UnixException.java:94) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116) at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:219) at java.nio.file.Files.newByteChannel(Files.java:371) at java.nio.file.Files.createFile(Files.java:648) at kafka.utils.TestUtils$.causeLogDirFailure(TestUtils.scala:1360) at kafka.server.LogDirFailureTest.testLogDirNotificationTimeout(LogDirFailureTest.scala:89) {noformat} > Flaky test suite LogDirFailureTest > ---------------------------------- > > Key: KAFKA-16225 > URL: https://issues.apache.org/jira/browse/KAFKA-16225 > Project: Kafka > Issue Type: Bug > Components: core, unit tests > Reporter: Greg Harris > Assignee: Omnia Ibrahim > Priority: Major > Labels: flaky-test > Attachments: Screenshot 2024-06-10 at 2.39.54 PM.png > > > I see this failure on trunk and in PR builds for multiple methods in this > test suite: > {noformat} > org.opentest4j.AssertionFailedError: expected: <true> but was: <false> > at > org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151) > > at > org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132) > > at org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63) > at org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36) > at org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:31) > at org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:179) > at kafka.utils.TestUtils$.causeLogDirFailure(TestUtils.scala:1715) > at > kafka.server.LogDirFailureTest.testProduceAfterLogDirFailureOnLeader(LogDirFailureTest.scala:186) > > at > kafka.server.LogDirFailureTest.testIOExceptionDuringLogRoll(LogDirFailureTest.scala:70){noformat} > It appears this assertion is failing > [https://github.com/apache/kafka/blob/f54975c33135140351c50370282e86c49c81bbdd/core/src/test/scala/unit/kafka/utils/TestUtils.scala#L1715] > The other error which is appearing is this: > {noformat} > org.opentest4j.AssertionFailedError: Unexpected exception type thrown, > expected: <java.util.concurrent.ExecutionException> but was: > <java.lang.IllegalStateException> > at > org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151) > > at org.junit.jupiter.api.AssertThrows.assertThrows(AssertThrows.java:67) > at org.junit.jupiter.api.AssertThrows.assertThrows(AssertThrows.java:35) > at org.junit.jupiter.api.Assertions.assertThrows(Assertions.java:3111) > at > kafka.server.LogDirFailureTest.testProduceErrorsFromLogDirFailureOnLeader(LogDirFailureTest.scala:164) > > at > kafka.server.LogDirFailureTest.testProduceErrorFromFailureOnLogRoll(LogDirFailureTest.scala:64){noformat} > Failures appear to have started in this commit, but this does not indicate > that this commit is at fault: > [https://github.com/apache/kafka/tree/3d95a69a28c2d16e96618cfa9a1eb69180fb66ea] > -- This message was sent by Atlassian Jira (v8.20.10#820010)