[
https://issues.apache.org/jira/browse/KAFKA-16954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kirk True updated KAFKA-16954:
------------------------------
Labels: kip-848-client-support (was: )
> Move consumer leave operations on close to background thread
> ------------------------------------------------------------
>
> Key: KAFKA-16954
> URL: https://issues.apache.org/jira/browse/KAFKA-16954
> Project: Kafka
> Issue Type: Bug
> Components: clients, consumer
> Reporter: Lianet Magrans
> Assignee: Lianet Magrans
> Priority: Blocker
> Labels: kip-848-client-support
> Fix For: 3.8.0
>
>
> When a consumer unsubscribes, the app thread simply triggers an Unsubscribe
> event that will take care of it all in the background thread: release
> assignment (callbacks), clear assigned partitions, and send leave group HB.
> On the contrary, when a consumer is closed, these actions happen in both
> threads:
> * release assignment -> in the app thread by directly running the callbacks
> * clear assignment -> in app thread by updating the subscriptionState
> * send leave group HB -> in the background thread via an event LeaveOnClose
> This situation could lead to race conditions, mainly because of the close
> updating the subscription state in the app thread, when other operations in
> the background could be already running based on it. Ex.
> * unsubscribe in app thread (triggers background UnsubscribeEvent to revoke
> and leave)
> * unsubscribe fails (ex. interrupted, leaving operation running in the
> background thread to revoke partitions and leave)
> * consumer close (will revoke and clear assignment in the app thread)
> * UnsubscribeEvent in the background may fail by trying to revoke
> partitions that it does not own anymore - _No current assignment for
> partition ..._
> A basic check has been added to the background thread revocation to avoid the
> race condition, ensuring that we only revoke partitions we own, but still we
> should avoid the root cause, which is updating the assignment on the app
> thread. We should consider having the close operation as a single
> LeaveOnClose event handled in the background. That even already takes cares
> of revoking the partitions and clearing assignment on the background, so no
> need to take care of it in the app thread. We should only ensure that we
> processBackgroundEvents until the LeaveOnClose completes (to allow for
> callbacks to run in the app thread)
>
> Trying to understand the current approach, I imagine the initial motivation
> to have the callabacks (and assignment cleared) in the app thread was to
> avoid the back-and-forth: app thread close -> background thread leave event
> -> app thread to run callback -> background thread to clear assignment and
> send HB. But updating the assignment on the app thread ends up being
> problematic, as it mainly happens in the background so it opens up the door
> for race conditions on the subscription state.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)