[ 
https://issues.apache.org/jira/browse/KAFKA-16277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cameron Redpath updated KAFKA-16277:
------------------------------------
    Description: 
Consider the following scenario:

`topic-1`: 12 partitions

`topic-2`: 12 partitions

 

Of note, `topic-1` gets approximately 10 times more messages through it than 
`topic-2`. 

 

Both of these topics are consumed by a single application, single consumer 
group, which scales under load. Each member of the consumer group subscribes to 
both topics. The `partition.assignment.strategy` being used is 
`org.apache.kafka.clients.consumer.CooperativeStickyAssignor`. The application 
may start with one consumer. It consumes all partitions from both topics.

 

The problem begins when the application scales up to two consumers. What is 
seen is that all partitions from `topic-1` go to one consumer, and all 
partitions from `topic-2` go to the other consumer. In the case with one topic 
receiving more messages than the other, this results in a very imbalanced group 
where one consumer is receiving 10x the traffic of the other due to partition 
assignment.

 

This is the issue being seen in our cluster at the moment. See this graph of 
the number of messages being processed by each consumer as the group scales 
from one to four consumers:

!image-2024-02-19-13-00-28-306.png|width=537,height=612!

Things to note from this graphic:
 * With two consumers, the partitions for a topic all go to a single consumer 
each
 * With three consumers, the partitions for a topic are split between two 
consumers each
 * With four consumers, the partitions for a topic are split between three 
consumers each
 * The total number of messages being processed by each consumer group is very 
imbalanced throughout the entire period

 

With regard to the number of _partitions_ being assigned to each consumer, the 
group is balanced. However, the assignment appears to be biased so that 
partitions from the same topic go to the same consumer. In our scenario, this 
leads to very bad partition assignment.

 

I question if the behaviour of the assignor should be revised, so that each 
topic has its partitions maximally spread across all available members of the 
consumer group. In the above scenario, this would result in much more even 
distribution of load. The behaviour would then be:
 * With two consumers, 6 partitions from each topic go to each consumer
 * With three consumers, 4 partitions from each topic go to each consumer
 * With four consumers, 3 partitions from each topic go to each consumer

 

Of note, we only saw this behaviour after migrating to the 
`CooperativeStickyAssignor`. It was not an issue with the default partition 
assignment strategy.

 

It is possible this may be intended behaviour. In which case, what is the 
preferred workaround for our scenario? Our current workaround if we decide to 
go ahead with the update to `CooperativeStickyAssignor` may be to limit our 
consumers so they only subscribe to one topic, and have two consumer threads 
per instance of the application.  

  was:
Consider the following scenario:

`topic-1`: 12 partitions

`topic-2`: 12 partitions

 

Of note, `topic-1` gets approximately 10 times more messages through it than 
`topic-2`. 

 

Both of these topics are consumed by a single application, single consumer 
group, which scales under load. Each member of the consumer group subscribes to 
both topics. The `partition.assignment.strategy` being used is 
`org.apache.kafka.clients.consumer.CooperativeStickyAssignor`. The application 
may start with one consumer. It consumes all partitions from both topics.

 

The problem begins when the application scales up to two consumers. What is 
seen is that all partitions from `topic-1` go to one consumer, and all 
partitions from `topic-2` go to the other consumer. In the case with one topic 
receiving more messages than the other, this results in a very imbalanced group 
where one consumer is receiving 10x the traffic of the other due to partition 
assignment.

 

This is the issue being seen in our cluster at the moment. See this graph of 
the number of messages being processed by each consumer as the group scales 
from one to four consumers:

!image-2024-02-19-13-00-28-306.png|width=537,height=612!

Things to note from this graphic:
 * With two consumers, the partitions for a topic all go to a single consumer 
each
 * With three consumers, the partitions for a topic are split between two 
consumers each
 * With four consumers, the partitions for a topic are split between three 
consumers each

 

With regard to the number of _partitions_ being assigned to each consumer, the 
group is balanced. However, the assignment appears to be biased so that 
partitions from the same topic go to the same consumer. In our scenario, this 
leads to very bad partition assignment.

 

I question if the behaviour of the assignor should be revised, so that each 
topic has its partitions maximally spread across all available members of the 
consumer group. In the above scenario, this would result in much more even 
distribution of load. The behaviour would then be:
 * With two consumers, 6 partitions from each topic go to each consumer
 * With three consumers, 4 partitions from each topic go to each consumer
 * With four consumers, 3 partitions from each topic go to each consumer

 

Of note, we only saw this behaviour after migrating to the 
`CooperativeStickyAssignor`. It was not an issue with the default partition 
assignment strategy.

 

It is possible this may be intended behaviour. In which case, what is the 
preferred workaround for our scenario? Our current workaround if we decide to 
go ahead with the update to `CooperativeStickyAssignor` may be to limit our 
consumers so they only subscribe to one topic, and have two consumer threads 
per instance of the application.  


> CooperativeStickyAssignor does not spread topics evenly among consumer group
> ----------------------------------------------------------------------------
>
>                 Key: KAFKA-16277
>                 URL: https://issues.apache.org/jira/browse/KAFKA-16277
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Cameron Redpath
>            Priority: Major
>         Attachments: image-2024-02-19-13-00-28-306.png
>
>
> Consider the following scenario:
> `topic-1`: 12 partitions
> `topic-2`: 12 partitions
>  
> Of note, `topic-1` gets approximately 10 times more messages through it than 
> `topic-2`. 
>  
> Both of these topics are consumed by a single application, single consumer 
> group, which scales under load. Each member of the consumer group subscribes 
> to both topics. The `partition.assignment.strategy` being used is 
> `org.apache.kafka.clients.consumer.CooperativeStickyAssignor`. The 
> application may start with one consumer. It consumes all partitions from both 
> topics.
>  
> The problem begins when the application scales up to two consumers. What is 
> seen is that all partitions from `topic-1` go to one consumer, and all 
> partitions from `topic-2` go to the other consumer. In the case with one 
> topic receiving more messages than the other, this results in a very 
> imbalanced group where one consumer is receiving 10x the traffic of the other 
> due to partition assignment.
>  
> This is the issue being seen in our cluster at the moment. See this graph of 
> the number of messages being processed by each consumer as the group scales 
> from one to four consumers:
> !image-2024-02-19-13-00-28-306.png|width=537,height=612!
> Things to note from this graphic:
>  * With two consumers, the partitions for a topic all go to a single consumer 
> each
>  * With three consumers, the partitions for a topic are split between two 
> consumers each
>  * With four consumers, the partitions for a topic are split between three 
> consumers each
>  * The total number of messages being processed by each consumer group is 
> very imbalanced throughout the entire period
>  
> With regard to the number of _partitions_ being assigned to each consumer, 
> the group is balanced. However, the assignment appears to be biased so that 
> partitions from the same topic go to the same consumer. In our scenario, this 
> leads to very bad partition assignment.
>  
> I question if the behaviour of the assignor should be revised, so that each 
> topic has its partitions maximally spread across all available members of the 
> consumer group. In the above scenario, this would result in much more even 
> distribution of load. The behaviour would then be:
>  * With two consumers, 6 partitions from each topic go to each consumer
>  * With three consumers, 4 partitions from each topic go to each consumer
>  * With four consumers, 3 partitions from each topic go to each consumer
>  
> Of note, we only saw this behaviour after migrating to the 
> `CooperativeStickyAssignor`. It was not an issue with the default partition 
> assignment strategy.
>  
> It is possible this may be intended behaviour. In which case, what is the 
> preferred workaround for our scenario? Our current workaround if we decide to 
> go ahead with the update to `CooperativeStickyAssignor` may be to limit our 
> consumers so they only subscribe to one topic, and have two consumer threads 
> per instance of the application.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to