[
https://issues.apache.org/jira/browse/KAFKA-10614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Guozhang Wang resolved KAFKA-10614.
-----------------------------------
Fix Version/s: 3.0.0
Resolution: Fixed
> Group coordinator onElection/onResignation should guard against leader epoch
> ----------------------------------------------------------------------------
>
> Key: KAFKA-10614
> URL: https://issues.apache.org/jira/browse/KAFKA-10614
> Project: Kafka
> Issue Type: Bug
> Components: core
> Reporter: Guozhang Wang
> Assignee: Tom Bentley
> Priority: Major
> Fix For: 3.0.0
>
>
> When there are a sequence of LeaderAndISR or StopReplica requests sent from
> different controllers causing the group coordinator to elect / resign, we may
> re-order the events due to race condition. For example:
> 1) First LeaderAndISR request received from old controller to resign as the
> group coordinator.
> 2) Second LeaderAndISR request received from new controller to elect as the
> group coordinator.
> 3) Although threads handling the 1/2) requests are synchronized on the
> replica manager, their callback {{onLeadershipChange}} would trigger
> {{onElection/onResignation}} which would schedule the loading / unloading on
> background threads, and are not synchronized.
> 4) As a result, the {{onElection}} maybe triggered by the thread first, and
> then {{onResignation}}. As a result, the coordinator would not recognize it
> self as the coordinator and hence would respond any coordinator request with
> {{NOT_COORDINATOR}}.
> Here are two proposals on top of my head:
> 1) Let the scheduled load / unload function to keep the passed in leader
> epoch, and also materialize the epoch in memory. Then when execute the
> unloading check against the leader epoch.
> 2) This may be a bit simpler: using a single background thread working on a
> FIFO queue of loading / unloading jobs, since the caller are actually
> synchronized on replica manager and order preserved, the enqueued loading /
> unloading job would be correctly ordered as well. In that case we would avoid
> the reordering.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)