[ https://issues.apache.org/jira/browse/KAFKA-3083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15107417#comment-15107417 ]
Mayuresh Gharat commented on KAFKA-3083: ---------------------------------------- Hi [~fpj], Correct me if I am wrong : 1) We need to use a multi-op that combines the update to the ISR and a znode check. The znode check verifies that the version of the controller leadership znode is still the same and if it passes, then the ISR data is updated. 2) The race condition that [~junrao] mentioned still exist above in 1). 3) To overcome this we somehow need to detect that the broker A who was the controller got a session expiration and should drop all the zk work its doing immediately. 4) To do step 3), as [~junrao] suggested we have to detect the connection loss event. Now 2 things might happen : i) Broker A has connection loss and connects immediately in which case it gets a SyncConnected event. Now the session MIGHT NOT have expired since the connection happened immediately. Broker A is expected to continue since it is still the controller and the session has not expired. ii) Broker A has connection loss and connects back in which case it gets a SyncConnected event. Now the session MIGHT have expired. Broker A is expected to stop all the zk operations. The only difference between i) and ii) is SessionExpiration check. > a soft failure in controller may leave a topic partition in an inconsistent > state > --------------------------------------------------------------------------------- > > Key: KAFKA-3083 > URL: https://issues.apache.org/jira/browse/KAFKA-3083 > Project: Kafka > Issue Type: Bug > Components: core > Affects Versions: 0.9.0.0 > Reporter: Jun Rao > Assignee: Mayuresh Gharat > > The following sequence can happen. > 1. Broker A is the controller and is in the middle of processing a broker > change event. As part of this process, let's say it's about to shrink the isr > of a partition. > 2. Then broker A's session expires and broker B takes over as the new > controller. Broker B sends the initial leaderAndIsr request to all brokers. > 3. Broker A continues by shrinking the isr of the partition in ZK and sends > the new leaderAndIsr request to the broker (say C) that leads the partition. > Broker C will reject this leaderAndIsr since the request comes from a > controller with an older epoch. Now we could be in a situation that Broker C > thinks the isr has all replicas, but the isr stored in ZK is different. -- This message was sent by Atlassian JIRA (v6.3.4#6332)