[ 
https://issues.apache.org/jira/browse/KAFKA-14071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qinghui Xu updated KAFKA-14071:
-------------------------------
    Description: 
*Kafka version:* 2.7.1

{*}Our scenarios:{*}{*}{*}

Each server has 72 cores, and running with around 100 request handler threads, 
50 network handler threads.

Many (a few hundreds) readers consuming from the same topic (and the same 
partitions) as they don't belong to the same consumer group.

Many (hundreds) producers are also producing data into the same topic, with a 
throughput around 100KB/s.

 

*The procedure to reproduce it:*

Move a partition leader replica to a new broker which was not the follower 
(meaning it does not have data for that partition)

 

*Observation:*

All Kafka request handler threads are overloaded. After an analysis of the 
threaddump, it seems most of them are trying to read the same log segment file 
which requires locking a monitor on a specific object in the 
`sun.nio.ch.FileChannelImpl`.

 

*Other remarks:*

Problem is not reproduced when it's a simple leadership transition between the 
replicas. For example, we try to shut down the leader broker, or move leader to 
another follower using kafka assignment script, it's working fine.

  was:
*Kafka version:* 2.7.1


{*}Our scenarios:{*}{*}{*}

Each server has 72 cores, and running with around 100 request handler threads, 
50 network handler threads.

Many (a few hundreds) readers consuming from the same topic (and the same 
partitions) as they don't belong to the same consumer group.

Many (hundreds) producers are also producing data into the same topic, with a 
throughput around 100KB/s.

 

*The procedure to reproduce it:*

Move a partition leader replica to a new broker which was not the follower 
(meaning it does not have data for that partition)

 

*Observation:*

All Kafka request handler threads are overloaded. After an analysis of the 
threaddump, it seems most of them are trying to read the same log segment file 
which requires locking a monitor on a specific object in the 
`sun.nio.ch.FileChannelImpl`.

 

*Other remarks:*

Problem is not reproduced when it's a simple leadership transition between the 
replicas. For example, we try to shut down the leader broker, or move leader to 
another follower using kafka assignment script, it's working fine.


> Kafka request handler threads saturated when moving a partition
> ---------------------------------------------------------------
>
>                 Key: KAFKA-14071
>                 URL: https://issues.apache.org/jira/browse/KAFKA-14071
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>            Reporter: Qinghui Xu
>            Priority: Major
>
> *Kafka version:* 2.7.1
> {*}Our scenarios:{*}{*}{*}
> Each server has 72 cores, and running with around 100 request handler 
> threads, 50 network handler threads.
> Many (a few hundreds) readers consuming from the same topic (and the same 
> partitions) as they don't belong to the same consumer group.
> Many (hundreds) producers are also producing data into the same topic, with a 
> throughput around 100KB/s.
>  
> *The procedure to reproduce it:*
> Move a partition leader replica to a new broker which was not the follower 
> (meaning it does not have data for that partition)
>  
> *Observation:*
> All Kafka request handler threads are overloaded. After an analysis of the 
> threaddump, it seems most of them are trying to read the same log segment 
> file which requires locking a monitor on a specific object in the 
> `sun.nio.ch.FileChannelImpl`.
>  
> *Other remarks:*
> Problem is not reproduced when it's a simple leadership transition between 
> the replicas. For example, we try to shut down the leader broker, or move 
> leader to another follower using kafka assignment script, it's working fine.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to