lianetm commented on code in PR #16686: URL: https://github.com/apache/kafka/pull/16686#discussion_r1747345639
########## clients/src/main/java/org/apache/kafka/clients/consumer/internals/AsyncKafkaConsumer.java: ########## @@ -1302,6 +1304,42 @@ private void releaseAssignmentAndLeaveGroup(final Timer timer) { } } + /** + * The unsubscribe process requires a handful of back-and-forth trips between the application thread and + * the background thread: + * + * <ol> + * <li> + * Application thread: enqueue {@link UnsubscribeEvent} + * </li> + * <li> + * Background thread: process {@link UnsubscribeEvent} and + * enqueue {@link ConsumerRebalanceListenerCallbackNeededEvent} + * </li> + * <li> + * Application thread: process {@link ConsumerRebalanceListenerCallbackNeededEvent}, + * invoke appropriate {@link ConsumerRebalanceListener} method, and + * enqueue {@link ConsumerRebalanceListenerCallbackCompletedEvent} + * </li> + * <li> + * Background thread: process {@link ConsumerRebalanceListenerCallbackCompletedEvent} and + * enqueue {@link NetworkClientDelegate.UnsentRequest} to send the + * {@link ConsumerGroupHeartbeatRequest} to leave the consumer group + * </li> + * </ol> + * + * In cases where the incoming {@link Timer timer} has very little remaining time, e.g. 0, it is impossible + * to perform the thread switches necessary to leave the group. Therefore, in cases where there isn't much + * of a timeout left, increase it slightly (presently 1000 ms.) to improve the chances for the consumer to + * properly leave the group. Review Comment: very interesting, and good that you tried out the close(0) approach because it reveals a bigger gap we still have, regardless of the interrupt issue: close with low timeouts may still not send the leave. This is definitely bigger than the interrupt, so what about we tackle this separately, to properly think about the best approach (this wait for 1000 is definitely one but seems brittle, what's the magic number there?). We could maybe simply have a way to GenerateLeaveRequestEvent that we can explicitly call and wait on when we know we didn't have time to Unsubscribe properly). We should also add an integration test like the one you have here but for close(0). What about: 1. unblock this PR, only for the interrupt (description includes the low timeouts too), with the approach you had of allowing a close with it's original timeout. 2. new jira to review and fix close with low timeout that may not send leave group 3. new jira blocked on #2, to review close behaviour on interrupt, and attempt to respect the interrupted status better, by not allowing the close to take up all the time it wants Thoughts? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org