[ https://issues.apache.org/jira/browse/KAFKA-9750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066576#comment-17066576 ]
ASF GitHub Bot commented on KAFKA-9750: --------------------------------------- chia7712 commented on pull request #8344: KAFKA-9750 Flaky test kafka.server.ReplicaManagerTest.testFencedError… URL: https://github.com/apache/kafka/pull/8344 ```scala // change the epoch from 0 to 1 in order to make fenced error replicaManager.becomeLeaderOrFollower(0, leaderAndIsrRequest(1), (_, _) => ()) TestUtils.waitUntilTrue(() => replicaManager.replicaAlterLogDirsManager.fetcherThreadMap.values.forall(_.partitionCount() == 0), s"the partition=$topicPartition should be removed from pending state") ``` The root cause is race condition. The partition is add to the end instead of being removed if the epoch in ReplicaAlterLogDirsThread is increased. This PR includes following changes. 1. controls the lock of ReplicaAlterLogDirsThread to make the fenced error happen almost. 1. wait for the completion of thread ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Flaky test kafka.server.ReplicaManagerTest.testFencedErrorCausedByBecomeLeader > ------------------------------------------------------------------------------ > > Key: KAFKA-9750 > URL: https://issues.apache.org/jira/browse/KAFKA-9750 > Project: Kafka > Issue Type: Bug > Components: core > Reporter: Bob Barrett > Assignee: Chia-Ping Tsai > Priority: Major > Labels: flaky-test > > When running tests locally, I've seen that 1-2% of the time, > testFencedErrorCausedByBecomeLeader fails with > {code:java} > org.scalatest.exceptions.TestFailedException: the partition=test-topic-0 > should be removed from pending > stateorg.scalatest.exceptions.TestFailedException: the partition=test-topic-0 > should be removed from pending state > at > org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530) at > org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529) > at > org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1389) > at org.scalatest.Assertions.fail(Assertions.scala:1091) at > org.scalatest.Assertions.fail$(Assertions.scala:1087) at > org.scalatest.Assertions$.fail(Assertions.scala:1389) at > kafka.server.ReplicaManagerTest.testFencedErrorCausedByBecomeLeader(ReplicaManagerTest.scala:248) > at jdk.internal.reflect.GeneratedMethodAccessor25.invoke(Unknown Source) at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at > org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at > org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at > org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at > org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at > org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at > org.junit.runners.ParentRunner.run(ParentRunner.java:413) at > org.junit.runner.JUnitCore.run(JUnitCore.java:137) at > com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) > at > com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:40) > at > com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:230) > at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:58) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)