tinaselenge opened a new pull request, #14573: URL: https://github.com/apache/kafka/pull/14573
- Fixed some of the failing tests in FetchRequestTest. `testFetchWithPartitionsWithIdError` and `testCreateIncrementalFetchWithPartitionsInErrorV12` fail with the following error when enabled with KRaft mode. These tests only fail sometimes when running locally but consistently failed when running in the Jenkins Pipeline. ``` expected: <0> but was: <6> Expected :0 Actual :6 <Click to see difference> org.opentest4j.AssertionFailedError: expected: <0> but was: <6> at app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151) at app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132) at app//org.junit.jupiter.api.AssertEquals.failNotEqual(AssertEquals.java:197) at app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:134) at app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:129) at app//org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:325) at app//kafka.server.FetchRequestTest.testCreateIncrementalFetchWithPartitionsInErrorV12(FetchRequestTest.scala:547) ... ``` The tests create topic partitions and send fetch requests for them. The expected error code to be returned is 0 however they get `NOT_LEADER_OR_FOLLOWER` exception. We throw this exception when trying to read records from a log that doesn’t exist (https://github.com/apache/kafka/blob/dc6a53e19606674bd1276bf05d3ae7a3a2115523/core/src/main/scala/kafka/cluster/Partition.scala#L504). The issue seems to be due to receiving fetch requests before logs being created on the broker. The following test logs with extra debug lines show that we attempted to read records from the log for topic patition foo-1, a few milliseconds before it was created. ``` [2023-10-18 12:44:18,200] DEBUG [Partition foo-1 broker=0] Getting log for topic with id foo-1 to read its records (kafka.cluster.Partition:62) [2023-10-18 12:44:18,200] DEBUG [Partition foo-1 broker=0] NOT_LEADER_OR_FOLLOWER foo-1 because log is empty (kafka.cluster.Partition:62) [2023-10-18 12:44:18,205] INFO Created log for partition foo-1 in /var/folders/fm/65mtddt52vjf8hycyj0rn64r0000gn/T/kafka-6502713585191291330/foo-1 with properties {} (kafka.log.LogManager:66) [2023-10-18 12:44:18,212] DEBUG [KafkaApi-0] Fetch request with correlation id 1 from client client-id on partition AAAAAAAAAAAAAAAAAAAAAA:foo-1 failed due to org.apache.kafka.common.errors.NotLeaderOrFollowerException (kafka.server.KafkaApis:62) ``` In Zookeeper case, the log was created much earlier than attempting to read records from it therefore no error was returned for the fetch requests. ``` [2023-10-18 12:38:31,252] INFO Created log for partition foo-1 in /var/folders/fm/65mtddt52vjf8hycyj0rn64r0000gn/T/kafka-1270567419648769103/foo-1 with properties {} (kafka.log.LogManager:66) [2023-10-18 12:38:31,455] DEBUG [Partition foo-1 broker=0] Getting log for topic with id foo-1 to read its records (kafka.cluster.Partition:62) ``` The only difference between Zookeeper and KRaft mode in these tests is the way the topic partitions are created. In Zookeeper mode, we create the topic partitions directly with Zookeeper therefore seem to take less time to create the logs. In KRaft mode, we use Admin client to create topic partitions. Even though the test waits for topic partitions to get created and appear in metadata cache before sending fetch requests, it doesn’t seem to be sufficient time for the logs to be created on the brokers. Adding some sleep after creating the topic partitions and before sending the fetch requests fixed the tests. - Enabled all tests except `checkLastFetchedEpochValidation` with KRaft mode. Looking at the build history in Jenkins, all the other tests except these 2 tests and checkLastFetchedEpochValidation were passing when they were enabled with KRaft mode. Therefore enabled them with KRaft mode again but left `checkLastFetchedEpochValidation` to be investigated further. ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org