[
https://issues.apache.org/jira/browse/KAFKA-532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Neha Narkhede updated KAFKA-532:
--------------------------------
Attachment: kafka-532-v2.patch
Thanks for the review, Jun !
1, 3 That's a bug, fixed it
2. Changed it to be a case class instead of tuple
4. While adding comments, realized that there was a bug in the way we computed
the size of the leader and isr request. The size had an extra 1 byte in the be
ginning, not sure if its required or not. This is probably a bug introduce in
the very first version of the controller that we didn't catch during testing.
6. I'm afraid that will not solve the problem. The whole point of the
controller generation is to prevent the brokers from following requests sent by
a stale controller. It doesn't matter whether the controller is re-publishing
the old controller's decision or making its own, once it sends the decision to
the brokers, it is effectively certifying that decision to be the right one.
Hence, both the leader and isr request as well as the stop replica request
needs to contain the epoch of the controller sending the request.With the above
semantics, the new controller should re-write the leader and isr path with its
epoch after sending the leader and isr request to the brokers. However,
re-writing the path during the controller failover will have performance
implications on the controller failover latency. An alternative is to do this
in the leader and isr response callback. Currently, we rely on asynchronous
leader election to work correctly. Ideally, we need to be able to act on the
event that the leader and isr response is either negative or lost. When this
happens, leader election needs to be triggered again. Since this is
asynchronous, we can also update the leader and isr path with the new
controller's epoch on receiving a successful leader and isr response. If this
sounds good, I can either make the changes in patch v3 or file another JIRA.
Let me know what you prefer. Until then, the broker will re-write the zk path
with the latest controller epoch, which is theoretically correct, but not
semantically.
5, 7. With the semantics mentioned above, the brokers should just write the isr
with the controller epoch that it knows.
> Multiple controllers can co-exist during soft failures
> ------------------------------------------------------
>
> Key: KAFKA-532
> URL: https://issues.apache.org/jira/browse/KAFKA-532
> Project: Kafka
> Issue Type: Bug
> Affects Versions: 0.8
> Reporter: Neha Narkhede
> Assignee: Neha Narkhede
> Priority: Blocker
> Labels: bugs
> Attachments: kafka-532-v1.patch, kafka-532-v2.patch
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> If the current controller experiences an intermittent soft failure (GC pause)
> in the middle of leader election or partition reassignment, a new controller
> might get elected and start communicating new state change decisions to the
> brokers. After recovering from the soft failure, the old controller might
> continue sending some stale state change decisions to the brokers, resulting
> in unexpected failures. We need to introduce a controller generation id that
> increments with controller election. The brokers should reject any state
> change requests by a controller with an older generation id.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira