[
https://issues.apache.org/jira/browse/KAFKA-20198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ziyun Fu reassigned KAFKA-20198:
--------------------------------
Assignee: (was: Ziyun Fu)
> StickyPartitionAssignor with group protocol classic is not acting sticky
> ------------------------------------------------------------------------
>
> Key: KAFKA-20198
> URL: https://issues.apache.org/jira/browse/KAFKA-20198
> Project: Kafka
> Issue Type: Bug
> Components: streams
> Affects Versions: 4.1.1
> Reporter: Jochen Rauschenbusch
> Priority: Major
> Attachments: HATaskAssignorLogs.json, StickyTaskAssignorLogs.json
>
>
> h2. Problem
> During some tests, I noticed that many state stores were closed during group
> rebalancing triggered by instance scaling. I assumed that the
> StickyTaskAssignor was supposed to prevent exactly this. However, with each
> new application instance that started the stream, the rebalancing resulted in
> a cascade of "Handle new assignments" log entries. Scaling from one to two
> application instances (each with ten Kafka stream threads) generated 429 such
> entries, which seems excessive. The log entries showed that almost all tasks
> were moved to other group members throughout the entire rebalancing phase.
> h2. Setup
> * Scala application based on Scala 2.13 and Kafka Streams
> * Application consumes from a single topic having 450 Partitions
> * Stream topology is implementing some stateful aggregations
> * Change logging is disabled. Only InMemory state stores are used.
> * Each app instance is configured to create 10 Stream Threads
> *Following libraries are used*
> * org.apache.kafka:kafka-streams:4.2.0
> * org.apache.kafka:kafka-streams-scala_2.13:4.2.0
> * org.apache.kafka:kafka-streams-test-utils:4.2.0
> The Kafka Cluster based on v4.1.0 was created with [Strimzi Operator
> v0.50.0|https://github.com/strimzi/strimzi-kafka-operator/releases/tag/0.50.0].
> I already discussed this behavior with [~lucasbru] and it seems to be a bug:
> [Confluent Slack
> Channel|https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1770905604912249]
> h2. Further Tests
> Having implemented a pretty simple Spring Boot app with an absolut minimal
> topology revealed the same behavior. The topology in this case didn't used
> state stores at all. It just consumes from a single topic (again 450
> partitions) and does some logging of the key/value combinations. Also here
> the rebalancing led to a cascade of task re-assignments. Again i configured
> the app to use 10 Stream Threads.
> I also did another Tests with the HATaskAssignor. Here the logic seems to 1st
> revoke all assigned partitions and then re-assigns the tasks in a round-robin
> manner, which seems to be as expected.
> Another test using KIP-1071 showed that there the Sticky Task assignment
> works as expected.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)