[jira] [Updated] (KAFKA-20198) StickyPartitionAssignor with group protocol classic is not acting sticky

Jochen Rauschenbusch (Jira) Thu, 19 Feb 2026 06:18:03 -0800


     [ 
https://issues.apache.org/jira/browse/KAFKA-20198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jochen Rauschenbusch updated KAFKA-20198:
-----------------------------------------
    Description: 
h2. Problem

 During some tests, I noticed that many state stores were closed during group 
rebalancing triggered by instance scaling. I assumed that the 
StickyTaskAssignor was supposed to prevent exactly this. However, with each new 
application instance that started the stream, the rebalancing resulted in a 
cascade of "Handle new assignments" log entries. Scaling from one to two 
application instances (each with ten Kafka stream threads) generated 429 such 
entries, which seems excessive. The log entries showed that almost all tasks 
were moved to other group members throughout the entire rebalancing phase.
h2. Setup
 * Scala application based on Scala 2.13 and Kafka Streams
 * Application consumes from a single topic having 450 Partitions
 * Stream topology is implementing some stateful aggregations
 * Change logging is disabled. Only InMemory state stores are used.
 * Each app instance is configured to create 10 Stream Threads

*Following libraries are used*
 * org.apache.kafka:kafka-streams:4.2.0
 * org.apache.kafka:kafka-streams-scala_2.13:4.2.0
 * org.apache.kafka:kafka-streams-test-utils:4.2.0

The Kafka Cluster based on v4.1.0 was created with Strimzi Operator v0.50.0.

I already discussed this behavior with Lucas Brutschy and it seems to be a 
bug:[https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1770905604912249|Confluent
 Slack Channel]
h2. Further Tests

Having implemented a pretty simple Spring Boot app with an absolut minimal 
topology revealed the same behavior. The topology in this case didn't used 
state stores at all. It just consumes from a single topic (again 450 
partitions) and does some logging of the key/value combinations. Also here the 
rebalancing led to a cascade of task re-assignments. Again i configured the app 
to use 10 Stream Threads.

I also did another Tests with the HATaskAssignor. Here the logic seems to 1st 
revoke all assigned partitions and then re-assigns the tasks in a round-robin 
manner, which seems to be as expected.

Another test using KIP-1071 showed that there the Sticky Task assignment works 
as expected.

  was:
h2. Problem

 
During some tests, I noticed that many state stores were closed during group 
rebalancing triggered by instance scaling. I assumed that the 
StickyTaskAssignor was supposed to prevent exactly this. However, with each new 
application instance that started the stream, the rebalancing resulted in a 
cascade of "Handle new assignments" log entries. Scaling from one to two 
application instances (each with ten Kafka stream threads) generated 429 such 
entries, which seems excessive. The log entries showed that almost all tasks 
were moved to other group members throughout the entire rebalancing phase.
h2. Setup
 * Scala application based on Scala 2.13 and Kafka Streams
 * Application consumes from a single topic having 450 Partitions
 * Stream topology is implementing some stateful aggregations
 * Change logging is disabled. Only InMemory state stores are used.
 * Each app instance is configured to create 10 Stream Threads

*Following libraries are used*
 * org.apache.kafka:kafka-streams:4.2.0
 * org.apache.kafka:kafka-streams-scala_2.13:4.2.0
 * org.apache.kafka:kafka-streams-test-utils:4.2.0

The Kafka Cluster based on v4.1.0 was created with Strimzi Operator v0.50.0.

I already discussed this behavior with Lucas Brutschy and it seems to be a 
bug:[https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1770905604912249|Confluent
 Slack Channel]

Having implemented a pretty simple Spring Boot app with an absolut minimal 
topology revealed the same behavior. The topology in this case didn't used 
state stores at all. It just consumes from a single topic (again 450 
partitions) and does some logging of the key/value combinations. Also here the 
rebalancing led to a cascade of task re-assignments. Again i configured the app 
to use 10 Stream Threads.

I also did another Tests with the HATaskAssignor. Here the logic seems to 1st 
revoke all assigned partitions and then re-assigns the tasks in a round-robin 
manner, which seems to be as expected.

Another test using KIP-1071 showed that there the Sticky Task assignment works 
as expected.


> StickyPartitionAssignor with group protocol classic is not acting sticky
> ------------------------------------------------------------------------
>
>                 Key: KAFKA-20198
>                 URL: https://issues.apache.org/jira/browse/KAFKA-20198
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 4.1.1
>            Reporter: Jochen Rauschenbusch
>            Priority: Major
>         Attachments: HATaskAssignorLogs.json, StickyTaskAssignorLogs.json
>
>
> h2. Problem
>  During some tests, I noticed that many state stores were closed during group 
> rebalancing triggered by instance scaling. I assumed that the 
> StickyTaskAssignor was supposed to prevent exactly this. However, with each 
> new application instance that started the stream, the rebalancing resulted in 
> a cascade of "Handle new assignments" log entries. Scaling from one to two 
> application instances (each with ten Kafka stream threads) generated 429 such 
> entries, which seems excessive. The log entries showed that almost all tasks 
> were moved to other group members throughout the entire rebalancing phase.
> h2. Setup
>  * Scala application based on Scala 2.13 and Kafka Streams
>  * Application consumes from a single topic having 450 Partitions
>  * Stream topology is implementing some stateful aggregations
>  * Change logging is disabled. Only InMemory state stores are used.
>  * Each app instance is configured to create 10 Stream Threads
> *Following libraries are used*
>  * org.apache.kafka:kafka-streams:4.2.0
>  * org.apache.kafka:kafka-streams-scala_2.13:4.2.0
>  * org.apache.kafka:kafka-streams-test-utils:4.2.0
> The Kafka Cluster based on v4.1.0 was created with Strimzi Operator v0.50.0.
> I already discussed this behavior with Lucas Brutschy and it seems to be a 
> bug:[https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1770905604912249|Confluent
>  Slack Channel]
> h2. Further Tests
> Having implemented a pretty simple Spring Boot app with an absolut minimal 
> topology revealed the same behavior. The topology in this case didn't used 
> state stores at all. It just consumes from a single topic (again 450 
> partitions) and does some logging of the key/value combinations. Also here 
> the rebalancing led to a cascade of task re-assignments. Again i configured 
> the app to use 10 Stream Threads.
> I also did another Tests with the HATaskAssignor. Here the logic seems to 1st 
> revoke all assigned partitions and then re-assigns the tasks in a round-robin 
> manner, which seems to be as expected.
> Another test using KIP-1071 showed that there the Sticky Task assignment 
> works as expected.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (KAFKA-20198) StickyPartitionAssignor with group protocol classic is not acting sticky

Reply via email to