[ 
https://issues.apache.org/jira/browse/KAFKA-6630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar resolved KAFKA-6630.
------------------------------
       Resolution: Fixed
    Fix Version/s: 1.2.0

> Speed up the processing of TopicDeletionStopReplicaResponseReceived events on 
> the controller
> --------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-6630
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6630
>             Project: Kafka
>          Issue Type: Improvement
>          Components: core
>            Reporter: Lucas Wang
>            Assignee: Lucas Wang
>            Priority: Minor
>             Fix For: 1.2.0
>
>
> Problem Statement:
> We find in a large cluster with many partition replicas, it takes a long time 
> to successfully delete a topic. 
> Root cause:
> Further analysis shows that for a topic with N replicas, the controller 
> receives all the N StopReplicaResponses from brokers within a short time, 
> however sequentially handling all the N 
> TopicDeletionStopReplicaResponseReceived events one by one takes a long time.
> Specifically the functions triggered while handling every single 
> TopicDeletionStopReplicaResponseReceived event include:
> TopicDeletionStopReplicaResponseReceived.process calls 
> TopicDeletionManager.completeReplicaDeletion, which calls 
> TopicDeletionManager.resumeDeletions, which calls several inefficient 
> functions.
> The inefficient functions called inside TopicDeletionManager.resumeDeletions 
> include
> ReplicaStateMachine.areAllReplicasForTopicDeleted
> ReplicaStateMachine.isAtLeastOneReplicaInDeletionStartedState
> ReplicaStateMachine.replicasInState
> Each of the 3 inefficient functions above will iterate through all the 
> replicas in the cluster, and filter out the replicas belonging to a topic. In 
> a large cluster with many replicas, these functions can be quite slow. 
> Total deletion time for a topic becomes long in single threaded controller 
> processing model:
> Since the controller needs to sequentially process the queued 
> TopicDeletionStopReplicaResponseReceived events, if the time cost to process 
> one event is t, the total time to process all events for all replicas of a 
> topic is N * t.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to