[ 
https://issues.apache.org/jira/browse/KAFKA-10002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao resolved KAFKA-10002.
-----------------------------
    Fix Version/s: 2.7.0
       Resolution: Fixed

merged the PR to trunk

> Improve performances of StopReplicaRequest with large number of partitions to 
> be deleted
> ----------------------------------------------------------------------------------------
>
>                 Key: KAFKA-10002
>                 URL: https://issues.apache.org/jira/browse/KAFKA-10002
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: David Jacot
>            Assignee: David Jacot
>            Priority: Major
>             Fix For: 2.7.0
>
>
> I have noticed that StopReplicaRequests with partitions to be deleted are 
> extremely slow when there is more than 2000 partitions which leads to hitting 
> the request timeout in the controller. A request with 2000 partitions to be 
> deleted still works but performances degrades significantly with the number 
> increases. For examples, a request with 3000 partitions to be deletes takes 
> appox. 60 seconds to be processed.
> A CPU profile shows that most of the time is spent in checkpointing log start 
> offsets and recovery offsets. Almost 90% of the time is there. See attached. 
> When a partition is deleted, the replica manager calls 
> `ReplicaManager#asyncDelete` that checkpoints recovery offsets and log start 
> offsets. As the checkpoints are per data directory, the checkpointing is made 
> for all the partitions in the directory of the partition to be deleted. In 
> our case where we have only one data directory, if you deletes 1000 
> partitions, we end up checkpointing the same things 1000 times which is not 
> efficient.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to