[ 
https://issues.apache.org/jira/browse/KAFKA-532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-532:
--------------------------------

    Attachment: kafka-532-v2.patch

Thanks for the review, Jun ! 

1, 3 That's a bug, fixed it
2. Changed it to be a case class instead of tuple
4. While adding comments, realized that there was a bug in the way we computed 
the size of the leader and isr request. The size had an extra 1 byte in the be
ginning, not sure if its required or not. This is probably a bug introduce in 
the very first version of the controller that we didn't catch during testing.

6. I'm afraid that will not solve the problem. The whole point of the 
controller generation is to prevent the brokers from following requests sent by 
a stale controller. It doesn't matter whether the controller is re-publishing 
the old controller's decision or making its own, once it sends the decision to 
the brokers, it is effectively certifying that decision to be the right one. 
Hence, both the leader and isr request as well as the stop replica request 
needs to contain the epoch of the controller sending the request.With the above 
semantics, the new controller should re-write the leader and isr path with its 
epoch after sending the leader and isr request to the brokers. However, 
re-writing the path during the controller failover will have performance 
implications on the controller failover latency. An alternative is to do this 
in the leader and isr response callback. Currently, we rely on asynchronous 
leader election to work correctly. Ideally, we need to be able to act on the 
event that the leader and isr response is either negative or lost. When this 
happens, leader election needs to be triggered again. Since this is 
asynchronous, we can also update the leader and isr path with the new 
controller's epoch on receiving a successful leader and isr response. If this 
sounds good, I can either make the changes in patch v3 or file another JIRA. 
Let me know what you prefer. Until then, the broker will re-write the zk path 
with the latest controller epoch, which is theoretically correct, but not 
semantically.    

5, 7. With the semantics mentioned above, the brokers should just write the isr 
with the controller epoch that it knows.

                
> Multiple controllers can co-exist during soft failures
> ------------------------------------------------------
>
>                 Key: KAFKA-532
>                 URL: https://issues.apache.org/jira/browse/KAFKA-532
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.8
>            Reporter: Neha Narkhede
>            Assignee: Neha Narkhede
>            Priority: Blocker
>              Labels: bugs
>         Attachments: kafka-532-v1.patch, kafka-532-v2.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> If the current controller experiences an intermittent soft failure (GC pause) 
> in the middle of leader election or partition reassignment, a new controller 
> might get elected and start communicating new state change decisions to the 
> brokers. After recovering from the soft failure, the old controller might 
> continue sending some stale state change decisions to the brokers, resulting 
> in unexpected failures. We need to introduce a controller generation id that 
> increments with controller election. The brokers should reject any state 
> change requests by a controller with an older generation id.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to