[
https://issues.apache.org/jira/browse/KAFKA-12477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
A. Sophie Blee-Goldman reassigned KAFKA-12477:
----------------------------------------------
Assignee: A. Sophie Blee-Goldman
> Smart rebalancing with dynamic protocol selection
> -------------------------------------------------
>
> Key: KAFKA-12477
> URL: https://issues.apache.org/jira/browse/KAFKA-12477
> Project: Kafka
> Issue Type: Improvement
> Components: consumer
> Reporter: A. Sophie Blee-Goldman
> Assignee: A. Sophie Blee-Goldman
> Priority: Major
> Fix For: 3.0.0
>
>
> Users who want to upgrade their applications and enable the COOPERATIVE
> rebalancing protocol in their consumer apps are required to follow a double
> rolling bounce upgrade path. The reason for this is laid out in the [Consumer
> Upgrades|https://cwiki.apache.org/confluence/display/KAFKA/KIP-429%3A+Kafka+Consumer+Incremental+Rebalance+Protocol#KIP429:KafkaConsumerIncrementalRebalanceProtocol-Consumer]
> section of KIP-429. Basically, the ConsumerCoordinator picks a rebalancing
> protocol in its constructor based on the list of supported partition
> assignors. The protocol is selected as the highest protocol that is commonly
> supported by all assignors in the list, and never changes after that.
> This is a bit unfortunate because it may end up using an older protocol even
> after every member in the group has been updated to support the newer
> protocol. After the first rolling bounce of the upgrade, all members will
> have two assignors: "cooperative-sticky" and "range" (or
> sticky/round-robin/etc). At this point the EAGER protocol will still be
> selected due to the presence of the "range" assignor, but it's the
> "cooperative-sticky" assignor that will ultimately be selected for use in
> rebalances if that assignor is preferred (ie positioned first in the list).
> The only reason for the second rolling bounce is to strip off the "range"
> assignor and allow the upgraded members to switch over to COOPERATIVE. We
> can't allow them to use cooperative rebalancing until everyone has been
> upgraded, but once they have it's safe to do so.
> And there is already a way for the client to detect that everyone is on the
> new byte code: if the CooperativeStickyAssignor is selected by the group
> coordinator, then that means it is supported by all consumers in the group
> and therefore everyone must be upgraded.
> We may be able to save the second rolling bounce by dynamically updating the
> rebalancing protocol inside the ConsumerCoordinator as "the highest protocol
> supported by the assignor chosen by the group coordinator". This means we'll
> still be using EAGER at the first rebalance, since we of course need to wait
> for this initial rebalance to get the response from the group coordinator.
> But we should take the hint from the chosen assignor rather than dropping
> this information on the floor and sticking with the original protocol
--
This message was sent by Atlassian Jira
(v8.3.4#803005)