[ 
https://issues.apache.org/jira/browse/KAFKA-13217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-13217:
------------------------------------
    Description: 
In Kafka Streams, when an instance is shut down via the close() API, we 
intentionally skip sending a LeaveGroup request. This is because often the 
shutdown is not due to a scaling down event but instead some transient closure, 
such as during a rolling bounce. In cases where the instance is expected to 
start up again shortly after, we originally wanted to avoid that member's tasks 
from being redistributed across the remaining group members since this would 
disturb the stable assignment and could cause unnecessary state migration and 
restoration. We also hoped
to limit the disruption to just a single rebalance, rather than forcing the 
group to rebalance once when the member shuts down and then again when it comes 
back up. So it's really an optimization  for the case in which the shutdown is 
temporary.
 
That said, many of those optimizations are no longer necessary or at least much 
less useful given recent features and improvements. For example rebalances are 
now lightweight so skipping the 2nd rebalance is not as worth optimizing for, 
and the new assignor will take into account the actual underlying state for 
each task/partition assignment, rather than just the previous assignment, so 
the assignment should be considerably more stable across bounces and rolling 
restarts. 
 
Given that, it might be time to reconsider this optimization. Alternatively, we 
could introduce another form of the close() API that forces the member to leave 
the group, to be used in event of actual scale down rather than a transient 
bounce.

KIP-812: 
[https://cwiki.apache.org/confluence/display/KAFKA/KIP-812%3A+Introduce+another+form+of+the+%60KafkaStreams.close%28%29%60+API+that+forces+the+member+to+leave+the+consumer+group]

  was:
In Kafka Streams, when an instance is shut down via the close() API, we 
intentionally skip sending a LeaveGroup request. This is because often the 
shutdown is not due to a scaling down event but instead some transient closure, 
such as during a rolling bounce. In cases where the instance is expected to 
start up again shortly after, we originally wanted to avoid that member's tasks 
from being redistributed across the remaining group members since this would 
disturb the stable assignment and could cause unnecessary state migration and 
restoration. We also hoped
to limit the disruption to just a single rebalance, rather than forcing the 
group to rebalance once when the member shuts down and then again when it comes 
back up. So it's really an optimization  for the case in which the shutdown is 
temporary.
 
That said, many of those optimizations are no longer necessary or at least much 
less useful given recent features and improvements. For example rebalances are 
now lightweight so skipping the 2nd rebalance is not as worth optimizing for, 
and the new assignor will take into account the actual underlying state for 
each task/partition assignment, rather than just the previous assignment, so 
the assignment should be considerably more stable across bounces and rolling 
restarts. 
 
Given that, it might be time to reconsider this optimization. Alternatively, we 
could introduce another form of the close() API that forces the member to leave 
the group, to be used in event of actual scale down rather than a transient 
bounce.


> Reconsider skipping the LeaveGroup on close() or add an overload that does so
> -----------------------------------------------------------------------------
>
>                 Key: KAFKA-13217
>                 URL: https://issues.apache.org/jira/browse/KAFKA-13217
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: A. Sophie Blee-Goldman
>            Assignee: Sayantanu Dey
>            Priority: Major
>              Labels: kip, newbie, newbie++
>             Fix For: 3.3.0
>
>
> In Kafka Streams, when an instance is shut down via the close() API, we 
> intentionally skip sending a LeaveGroup request. This is because often the 
> shutdown is not due to a scaling down event but instead some transient 
> closure, such as during a rolling bounce. In cases where the instance is 
> expected to start up again shortly after, we originally wanted to avoid that 
> member's tasks from being redistributed across the remaining group members 
> since this would disturb the stable assignment and could cause unnecessary 
> state migration and restoration. We also hoped
> to limit the disruption to just a single rebalance, rather than forcing the 
> group to rebalance once when the member shuts down and then again when it 
> comes back up. So it's really an optimization  for the case in which the 
> shutdown is temporary.
>  
> That said, many of those optimizations are no longer necessary or at least 
> much less useful given recent features and improvements. For example 
> rebalances are now lightweight so skipping the 2nd rebalance is not as worth 
> optimizing for, and the new assignor will take into account the actual 
> underlying state for each task/partition assignment, rather than just the 
> previous assignment, so the assignment should be considerably more stable 
> across bounces and rolling restarts. 
>  
> Given that, it might be time to reconsider this optimization. Alternatively, 
> we could introduce another form of the close() API that forces the member to 
> leave the group, to be used in event of actual scale down rather than a 
> transient bounce.
> KIP-812: 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-812%3A+Introduce+another+form+of+the+%60KafkaStreams.close%28%29%60+API+that+forces+the+member+to+leave+the+consumer+group]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to