[ 
https://issues.apache.org/jira/browse/KAFKA-5175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16011851#comment-16011851
 ] 

Onur Karaman edited comment on KAFKA-5175 at 5/16/17 7:01 AM:
--------------------------------------------------------------

Cool. I can steadily reproduce this by inserting a long sleep in 
Partition.maybeExpandIsr:
{code}
diff --git a/core/src/main/scala/kafka/cluster/Partition.scala 
b/core/src/main/scala/kafka/cluster/Partition.scala
index 1d13689..b811d31 100755
--- a/core/src/main/scala/kafka/cluster/Partition.scala
+++ b/core/src/main/scala/kafka/cluster/Partition.scala
@@ -281,6 +281,7 @@ class Partition(val topic: String,
   def maybeExpandIsr(replicaId: Int, logReadResult: LogReadResult): Boolean = {
     inWriteLock(leaderIsrUpdateLock) {
       // check if this replica needs to be added to the ISR
+      Thread.sleep(100000)
       leaderReplicaIfLocal match {
         case Some(leaderReplica) =>
           val replica = getReplica(replicaId).get
{code}

This makes me think that the controller is processing the preferred replica 
leader election before the restarted broker (the preferred replica leader) has 
joined isr, causing preferred replica leader election to fail and for the final 
zookeeper state validation to fail.

{code}
kafka.controller.ControllerIntegrationTest > testPreferredReplicaLeaderElection 
FAILED
    java.lang.AssertionError: failed to get expected partition state upon 
broker startup
        at kafka.utils.TestUtils$.fail(TestUtils.scala:323)
        at kafka.utils.TestUtils$.waitUntilTrue(TestUtils.scala:823)
        at 
kafka.controller.ControllerIntegrationTest.waitForPartitionState(ControllerIntegrationTest.scala:291)
        at 
kafka.controller.ControllerIntegrationTest.testPreferredReplicaLeaderElection(ControllerIntegrationTest.scala:204)
{code}


was (Author: onurkaraman):
Cool. I can steadily reproduce this by inserting a long sleep in 
Partition.maybeExpandIsr:
{code}
diff --git a/core/src/main/scala/kafka/cluster/Partition.scala 
b/core/src/main/scala/kafka/cluster/Partition.scala
index 1d13689..b811d31 100755
--- a/core/src/main/scala/kafka/cluster/Partition.scala
+++ b/core/src/main/scala/kafka/cluster/Partition.scala
@@ -281,6 +281,7 @@ class Partition(val topic: String,
   def maybeExpandIsr(replicaId: Int, logReadResult: LogReadResult): Boolean = {
     inWriteLock(leaderIsrUpdateLock) {
       // check if this replica needs to be added to the ISR
+      Thread.sleep(100000)
       leaderReplicaIfLocal match {
         case Some(leaderReplica) =>
           val replica = getReplica(replicaId).get
{code}

This makes me think that the controller is processing the preferred replica 
leader election before the restarted broker (the preferred replica leader) has 
joined isr, causing preferred replica leader election to fail and for the final 
zookeeper state validation to fail.

> Transient failure: 
> ControllerIntegrationTest.testPreferredReplicaLeaderElection
> -------------------------------------------------------------------------------
>
>                 Key: KAFKA-5175
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5175
>             Project: Kafka
>          Issue Type: Sub-task
>            Reporter: Ismael Juma
>            Assignee: Onur Karaman
>
> {code}
> java.lang.AssertionError: failed to get expected partition state upon broker 
> startup
>       at kafka.utils.TestUtils$.fail(TestUtils.scala:311)
>       at kafka.utils.TestUtils$.waitUntilTrue(TestUtils.scala:811)
>       at 
> kafka.controller.ControllerIntegrationTest.waitForPartitionState(ControllerIntegrationTest.scala:293)
>       at 
> kafka.controller.ControllerIntegrationTest.testPreferredReplicaLeaderElection(ControllerIntegrationTest.scala:211)
> {code}
> https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/3497/testReport/kafka.controller/ControllerIntegrationTest/testPreferredReplicaLeaderElection/



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to