Gangadharan created KAFKA-18974: ----------------------------------- Summary: Uneven distribution of topic partitions across consumers while using Cooperative Sticky Assignor Key: KAFKA-18974 URL: https://issues.apache.org/jira/browse/KAFKA-18974 Project: Kafka Issue Type: Bug Components: clients, consumer Affects Versions: 3.8.1 Reporter: Gangadharan
I came across a scenario where we see the spread of partitions with topic across consumer threads is uneven. The topic with high TPS (for ex. 85% traffic) had more partitions compared to the topics with low TPS (for ex. 15% traffic). The consumer threads had subscribed to both set of topics. Subsequently, some of the consumer threads were assigned with the more partitions of low TPS topics. As a result, the pods with the consumer threads that had more partitions of high TPS topics had to slog more resulting in higher lag. However, if we choose round robin, the distribution is even between threads and across pods. But we are limited by the stop the world condition. There was already an issue raised and fixed on this context. However, it doesn't fix the whole problem. I suspect that it is because, during the rebalance the partitions that only the that are supposed to be moved from existing consumers are sorted and distributed. However, there was no logic to also check if the retained partitions should be moved to ensure even spread across consumers. [KAFKA-16277] CooperativeStickyAssignor does not spread topics evenly among consumer group - ASF Jira Below is a sample test: 2 pods with 6 consumer threads in each. Two topics with 18 partitions each (test_topic_1 with higher inflow compared to test_topicone_1). As we could see, the test_topic_1 is concentrated in pod1 as a result, it starts to create the lag for the cooperative sticky strategy. However, for round robin, we see it is distributed between pods. Note: The sample test with same partition count was put for the sake of understanding. Irrespective of the partition count of the topics, the behavior seems to be same. Cooperative Sticky: Pod1 c--> consumer 1912486590767 [test_topic_1-1, test_topic_1-3, {*}test_topicone_1{*}-1] c--> consumer 1922696734819 [test_topic_1-11, test_topic_1-6, {*}test_topicone_1{*}-6] c--> consumer 1941340051228 [test_topic_1-12, test_topic_1-7, {*}test_topicone_1{*}-7] c--> consumer 1940955938996 [test_topic_1-0, test_topic_1-8, {*}test_topicone_1{*}-0] c--> consumer 1941837822481 [test_topic_1-2, test_topic_1-9, {*}test_topicone_1{*}-2] c--> consumer 1942719746188 [test_topic_1-10, test_topic_1-4, {*}test_topicone_1{*}-4] Pod2 c--> consumer 1941486742305 [test_topic_1-13, {*}test_topicone_1{*}-13, {*}test_topicone_1{*}-5] c--> consumer 1941837974018 [test_topic_1-14, {*}test_topicone_1{*}-14, {*}test_topicone_1{*}-8] c--> consumer 1942719897724 [test_topic_1-15, {*}test_topicone_1{*}-15, {*}test_topicone_1{*}-9] c--> consumer 1942696886353 [test_topic_1-16, {*}test_topicone_1{*}-10, {*}test_topicone_1{*}-16] c--> consumer 1941340202762 [test_topic_1-17, {*}test_topicone_1{*}-11, {*}test_topicone_1{*}-17] c--> consumer 1940956090534 [test_topic_1-5, {*}test_topicone_1{*}-12, {*}test_topicone_1{*}-3] ----------------------------------------------------------------------------------------- Round Robin: Pod1 c--> consumer 1941408797822 [test_topic_1-0, test_topic_1-12, {*}test_topicone_1{*}-6] c--> consumer 1941456423553 [test_topic_1-9, {*}test_topicone_1{*}-15, {*}test_topicone_1{*}-3] c--> consumer 1942070859325 [test_topic_1-14, test_topic_1-2, {*}test_topicone_1{*}-8] c--> consumer 1941385036886 [test_topic_1-16, test_topic_1-4, {*}test_topicone_1{*}-10] c--> consumer 1941105638483 [test_topic_1-6, {*}test_topicone_1{*}-0, {*}test_topicone_1{*}-12] c--> consumer 1941885698382 [test_topic_1-10, {*}test_topicone_1{*}-16, {*}test_topicone_1{*}-4] Pod2 c--> consumer 1941456538287 [test_topic_1-8, {*}test_topicone_1{*}-14, {*}test_topicone_1{*}-2] c--> consumer 1942070974058 [test_topic_1-15, test_topic_1-3, {*}test_topicone_1{*}-9] c--> consumer 1941885813119 [test_topic_1-11, {*}test_topicone_1{*}-19, {*}test_topicone_1{*}-5] c--> consumer 1941408912555 [test_topic_1-1, test_topic_1-13, {*}test_topicone_1{*}-7] c--> consumer 1941385151618 [test_topic_1-17, test_topic_1-5, {*}test_topicone_1{*}-11] c--> consumer 1941105753216 [test_topic_1-7, {*}test_topicone_1{*}-1, {*}test_topicone_1{*}-13] -- This message was sent by Atlassian Jira (v8.20.10#820010)