[ https://issues.apache.org/jira/browse/KAFKA-14071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17741577#comment-17741577 ]
Hervé RIVIERE commented on KAFKA-14071: --------------------------------------- (I'm working in the same team than [~q.xu] .) We identified the issue is correlated to log segment size. In the case of thousands of consumer reading the same partition when they trigger a leader reassignment, the bigger the segment size will be the more saturated kafka io threads will be. Screenshot of an experimentation below made with 750 consumers (50 instances launching 15 java consumers each with default configuration) !Screenshot 2023-06-08 at 16.32.02.png|width=853,height=332! As we was not able yet to identify the issue on the server side nor specific client configuration change we ended up to setup small segment size (300 KB) for this specific topic > Kafka request handler threads saturated when moving a partition > --------------------------------------------------------------- > > Key: KAFKA-14071 > URL: https://issues.apache.org/jira/browse/KAFKA-14071 > Project: Kafka > Issue Type: Bug > Components: core > Reporter: Qinghui Xu > Priority: Major > Attachments: Screenshot 2023-06-08 at 16.32.02.png > > > *Kafka version:* 2.7.1 > > *Our scenario:* > Each server has 72 cores, and running with around 100 request handler > threads, 50 network handler threads. > Many (a few hundreds) readers consuming from the same topic (and the same > partitions) as they don't belong to the same consumer group. > Many (hundreds) producers are also producing data into the same topic, with a > throughput around 100KB/s. > > *The procedure to reproduce it:* > Move a partition leader replica to a new broker which was not the follower > (meaning it does not have data for that partition) > > *Observation:* > All Kafka request handler threads are overloaded. After an analysis of the > threaddump, it seems most of them are trying to read the same log segment > file which requires locking a monitor on a specific object in the > `sun.nio.ch.FileChannelImpl`. > > *Other remarks:* > Problem is not reproduced when it's a simple leadership transition between > the replicas. For example, we try to shut down the leader broker, or move > leader to another follower using kafka assignment script, it's working fine. -- This message was sent by Atlassian Jira (v8.20.10#820010)