Jiangjie Qin created KAFKA-5453:
-----------------------------------

             Summary: Controller may miss requests sent to the broker when zk 
session timeout happens.
                 Key: KAFKA-5453
                 URL: https://issues.apache.org/jira/browse/KAFKA-5453
             Project: Kafka
          Issue Type: Bug
            Reporter: Jiangjie Qin


The issue I encountered was the following:
1. Partition reassignment was in progress, one replica of a partition is being 
reassigned from broker 1 to broker 2.
2. Controller received an ISR change notification which indicates broker 2 has 
caught up.
3. Controller was sending StopReplicaRequest to broker 1.
4. Broker 1 zk session timeout occurs. Controller removed broker 1 from the 
cluster and cleaned up the queue. i.e. the StopReplicaRequest was removed from 
the ControllerChannelManager.
5. Broker 1 reconnected to zk and act as if it is still a follower replica of 
the partition. 
6. Broker 1 will always receive exception from the leader because it is not in 
the replica list.

Not sure what is the correct fix here. It seems that broke 1 in this case 
should ask the controller for the latest replica assignment.

There are two related bugs:
1. when a {{NotAssignedReplicaException}} is thrown from 
{{Partition.updateReplicaLogReadResult()}}, the other partitions in the same 
request will failed to update the fetch timestamp and offset and thus also drop 
out of the ISR.

2. The {{NotAssignedReplicaException}} was not properly returned to the 
replicas, instead, a UnknownServerException is returned.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to