Jochen Rauschenbusch created KAFKA-20198:
--------------------------------------------

             Summary: StickyPartitionAssignor with group protocol classic is 
not acting sticky
                 Key: KAFKA-20198
                 URL: https://issues.apache.org/jira/browse/KAFKA-20198
             Project: Kafka
          Issue Type: Bug
          Components: streams
    Affects Versions: 4.1.1
            Reporter: Jochen Rauschenbusch
         Attachments: HATaskAssignorLogs.json, StickyTaskAssignorLogs.json

Problem:
During some tests, I noticed that many state stores were closed during group 
rebalancing triggered by instance scaling. I assumed that the 
StickyTaskAssignor was supposed to prevent exactly this. However, with each new 
application instance that started the stream, the rebalancing resulted in a 
cascade of "Handle new assignments" log entries. Scaling from one to two 
application instances (each with ten Kafka stream threads) generated 429 such 
entries, which seems excessive. The log entries showed that almost all tasks 
were moved to other group members throughout the entire rebalancing phase.

Setup:
- Scala application based on Scala 2.13 and Kafka Streams
- Application consumes from a single topic having 450 Partitions
- Stream topology is implementing some stateful aggregations
- Change logging is disabled. Only InMemory state stores are used.
- Each app instance is configured to create 10 Stream Threads

Following libraries are used:
```
org.apache.kafka:kafka-streams:4.2.0
org.apache.kafka:kafka-streams-scala_2.13:4.2.0
org.apache.kafka:kafka-streams-test-utils:4.2.0
```

The Kafka Cluster based on v4.1.0 was created with Strimzi Operator v0.50.0.

I already discussed this behavior with [~lucasbru] and it seems to be a bug:
[https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1770905604912249]

Having implemented a pretty simple Spring Boot app with an absolut minimal 
topology revealed the same behavior. The topology in this case didn't used 
state stores at all. It just consumes from a single topic (again 450 
partitions) and does some logging of the key/value combinations. Also here the 
rebalancing led to a cascade of task re-assignments. Again i configured the app 
to use 10 Stream Threads.

I also did another Tests with the HATaskAssignor. Here the logic seems to 1st 
revoke all assigned partitions and then re-assigns the tasks in a round-robin 
manner, which seems to be as expected.

Another test using KIP-1071 showed that there the Sticky Task assignment works 
as expected.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to