[jira] [Updated] (KAFKA-1097) Race condition while reassigning partition leads to incorrect ISR information in zookeeper

Neha Narkhede (JIRA) Tue, 22 Oct 2013 09:26:32 -0700

     [ 
https://issues.apache.org/jira/browse/KAFKA-1097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Neha Narkhede updated KAFKA-1097:
---------------------------------

    Description: 
While moving partitions, the controller moves the old replicas through the 
following state changes -

ONLINE -> OFFLINE -> NON_EXISTENT

During the offline state change, the controller removes the old replica and 
writes the updated ISR to zookeeper and notifies the leader. Note that it 
doesn't notify the old replicas to stop fetching from the leader (to be fixed 
in KAFKA-1032). During the non-existent state change, the controller does not 
write the updated ISR or replica list to zookeeper. Right after the 
non-existent state change, the controller writes the new replica list to 
zookeeper, but does not update the ISR. So an old replica can send a fetch 
request after the offline state change, essentially letting the leader add it 
back to the ISR. The problem is that if there is no new data coming in for the 
partition and the old replica is fully caught up, the leader cannot remove it 
from the ISR. That lets a non existent replica live in the ISR at least until 
new data comes in to the partition

  was:
While moving partitions, the controller moves the old replicas through the 
following state changes -

ONLINE -> OFFLINE -> NON_EXISTENT

During the offline state change, the controller removes the old replica and 
writes the updated ISR to zookeeper and notifies the leader. Note that it 
doesn't notify the old replicas to stop fetching from the leader (to be fixed 
in KAFKA-1032). During the non-existent state change, the controller does not 
write the updated ISR or replica list to zookeeper. Right after the 
non-existent state change, the controller writes the new replica list to 
zookeeper, but does not update the ISR. So an old replica can send a fetch 
request after the offline state change, essentially letting the leader add it 
back to the ISR. That lets a non existent replica live in the ISR


> Race condition while reassigning partition leads to incorrect ISR information 
> in zookeeper 
> -------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-1097
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1097
>             Project: Kafka
>          Issue Type: Bug
>          Components: controller
>    Affects Versions: 0.8
>            Reporter: Neha Narkhede
>            Assignee: Neha Narkhede
>            Priority: Critical
>             Fix For: 0.8
>
>
> While moving partitions, the controller moves the old replicas through the 
> following state changes -
> ONLINE -> OFFLINE -> NON_EXISTENT
> During the offline state change, the controller removes the old replica and 
> writes the updated ISR to zookeeper and notifies the leader. Note that it 
> doesn't notify the old replicas to stop fetching from the leader (to be fixed 
> in KAFKA-1032). During the non-existent state change, the controller does not 
> write the updated ISR or replica list to zookeeper. Right after the 
> non-existent state change, the controller writes the new replica list to 
> zookeeper, but does not update the ISR. So an old replica can send a fetch 
> request after the offline state change, essentially letting the leader add it 
> back to the ISR. The problem is that if there is no new data coming in for 
> the partition and the old replica is fully caught up, the leader cannot 
> remove it from the ISR. That lets a non existent replica live in the ISR at 
> least until new data comes in to the partition



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (KAFKA-1097) Race condition while reassigning partition leads to incorrect ISR information in zookeeper

Reply via email to