[
https://issues.apache.org/jira/browse/KAFKA-20198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jochen Rauschenbusch updated KAFKA-20198:
-----------------------------------------
Description:
h2. Problem
During some tests, I noticed that many state stores were closed during group
rebalancing triggered by instance scaling. I assumed that the
StickyTaskAssignor was supposed to prevent exactly this. However, with each new
application instance that started the stream, the rebalancing resulted in a
cascade of "Handle new assignments" log entries. Scaling from one to two
application instances (each with ten Kafka stream threads) generated 429 such
entries, which seems excessive. The log entries showed that almost all tasks
were moved to other group members throughout the entire rebalancing phase.
h2. Setup
* Scala application based on Scala 2.13 and Kafka Streams
* Application consumes from a single topic having 450 Partitions
* Stream topology is implementing some stateful aggregations
* Change logging is disabled. Only InMemory state stores are used.
* Each app instance is configured to create 10 Stream Threads
*Following libraries are used*
* org.apache.kafka:kafka-streams:4.2.0
* org.apache.kafka:kafka-streams-scala_2.13:4.2.0
* org.apache.kafka:kafka-streams-test-utils:4.2.0
The Kafka Cluster based on v4.1.0 was created with Strimzi Operator v0.50.0.
I already discussed this behavior with Lucas Brutschy and it seems to be a
bug:[https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1770905604912249|Confluent
Slack Channel]
h2. Further Tests
Having implemented a pretty simple Spring Boot app with an absolut minimal
topology revealed the same behavior. The topology in this case didn't used
state stores at all. It just consumes from a single topic (again 450
partitions) and does some logging of the key/value combinations. Also here the
rebalancing led to a cascade of task re-assignments. Again i configured the app
to use 10 Stream Threads.
I also did another Tests with the HATaskAssignor. Here the logic seems to 1st
revoke all assigned partitions and then re-assigns the tasks in a round-robin
manner, which seems to be as expected.
Another test using KIP-1071 showed that there the Sticky Task assignment works
as expected.
was:
h2. Problem
During some tests, I noticed that many state stores were closed during group
rebalancing triggered by instance scaling. I assumed that the
StickyTaskAssignor was supposed to prevent exactly this. However, with each new
application instance that started the stream, the rebalancing resulted in a
cascade of "Handle new assignments" log entries. Scaling from one to two
application instances (each with ten Kafka stream threads) generated 429 such
entries, which seems excessive. The log entries showed that almost all tasks
were moved to other group members throughout the entire rebalancing phase.
h2. Setup
* Scala application based on Scala 2.13 and Kafka Streams
* Application consumes from a single topic having 450 Partitions
* Stream topology is implementing some stateful aggregations
* Change logging is disabled. Only InMemory state stores are used.
* Each app instance is configured to create 10 Stream Threads
*Following libraries are used*
* org.apache.kafka:kafka-streams:4.2.0
* org.apache.kafka:kafka-streams-scala_2.13:4.2.0
* org.apache.kafka:kafka-streams-test-utils:4.2.0
The Kafka Cluster based on v4.1.0 was created with Strimzi Operator v0.50.0.
I already discussed this behavior with Lucas Brutschy and it seems to be a
bug:[https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1770905604912249|Confluent
Slack Channel]
Having implemented a pretty simple Spring Boot app with an absolut minimal
topology revealed the same behavior. The topology in this case didn't used
state stores at all. It just consumes from a single topic (again 450
partitions) and does some logging of the key/value combinations. Also here the
rebalancing led to a cascade of task re-assignments. Again i configured the app
to use 10 Stream Threads.
I also did another Tests with the HATaskAssignor. Here the logic seems to 1st
revoke all assigned partitions and then re-assigns the tasks in a round-robin
manner, which seems to be as expected.
Another test using KIP-1071 showed that there the Sticky Task assignment works
as expected.
> StickyPartitionAssignor with group protocol classic is not acting sticky
> ------------------------------------------------------------------------
>
> Key: KAFKA-20198
> URL: https://issues.apache.org/jira/browse/KAFKA-20198
> Project: Kafka
> Issue Type: Bug
> Components: streams
> Affects Versions: 4.1.1
> Reporter: Jochen Rauschenbusch
> Priority: Major
> Attachments: HATaskAssignorLogs.json, StickyTaskAssignorLogs.json
>
>
> h2. Problem
> During some tests, I noticed that many state stores were closed during group
> rebalancing triggered by instance scaling. I assumed that the
> StickyTaskAssignor was supposed to prevent exactly this. However, with each
> new application instance that started the stream, the rebalancing resulted in
> a cascade of "Handle new assignments" log entries. Scaling from one to two
> application instances (each with ten Kafka stream threads) generated 429 such
> entries, which seems excessive. The log entries showed that almost all tasks
> were moved to other group members throughout the entire rebalancing phase.
> h2. Setup
> * Scala application based on Scala 2.13 and Kafka Streams
> * Application consumes from a single topic having 450 Partitions
> * Stream topology is implementing some stateful aggregations
> * Change logging is disabled. Only InMemory state stores are used.
> * Each app instance is configured to create 10 Stream Threads
> *Following libraries are used*
> * org.apache.kafka:kafka-streams:4.2.0
> * org.apache.kafka:kafka-streams-scala_2.13:4.2.0
> * org.apache.kafka:kafka-streams-test-utils:4.2.0
> The Kafka Cluster based on v4.1.0 was created with Strimzi Operator v0.50.0.
> I already discussed this behavior with Lucas Brutschy and it seems to be a
> bug:[https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1770905604912249|Confluent
> Slack Channel]
> h2. Further Tests
> Having implemented a pretty simple Spring Boot app with an absolut minimal
> topology revealed the same behavior. The topology in this case didn't used
> state stores at all. It just consumes from a single topic (again 450
> partitions) and does some logging of the key/value combinations. Also here
> the rebalancing led to a cascade of task re-assignments. Again i configured
> the app to use 10 Stream Threads.
> I also did another Tests with the HATaskAssignor. Here the logic seems to 1st
> revoke all assigned partitions and then re-assigns the tasks in a round-robin
> manner, which seems to be as expected.
> Another test using KIP-1071 showed that there the Sticky Task assignment
> works as expected.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)