[ 
https://issues.apache.org/jira/browse/KAFKA-10734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17234891#comment-17234891
 ] 

Jun Rao commented on KAFKA-10734:
---------------------------------

[~luwang], thanks for the jira. In practice, if n is large, m could just be 1, 
right?

> Speedup the processing of LeaderAndIsr request
> ----------------------------------------------
>
>                 Key: KAFKA-10734
>                 URL: https://issues.apache.org/jira/browse/KAFKA-10734
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Lucas Wang
>            Assignee: Lucas Wang
>            Priority: Major
>
> Consider the case where a LeaderAndIsr request contains many partitions, of 
> which the broker is asked to become the follower. Let's call these partitions 
> *partitionsToMakeFollower*. Further more, let's assume the cluster has n 
> brokers and each broker is configured to have m replica fetchers (via the 
> num.replica.fetchers config). 
> The broker is likely to have (n-1) * m fetcher threads.
> Processing the LeaderAndIsr request requires
> 1. removing the "partitionsToMakeFollower" from all of the fetcher threads 
> sequentially so that they won't be fetching from obsolete leaders.
> 2. adding the "partitionsToMakeFollower" to all of the fetcher threads 
> sequentially
> 3. shutting down the idle fetcher threads sequentially (by checking the 
> number of partitions held by each fetcher thread)
> On top of that, for each of the 3 operations above, the operation is handled 
> by the request handler thread (i.e. io thread). And to complete the 
> operation, the request handler thread needs to contend for the 
> "partitionMapLock" with the corresponding fetcher thread. In the worst case, 
> the request handler thread is blocked for (n-1) * m times for removing the 
> partitions, another (n-1) * m times for adding the partitions, and yet 
> another (n-1) * m times for shutting down the idle fetcher threads.
> Overall, all of the blocking can result in a significant delay in processing 
> the LeaderAndIsr request. The further implication is that if the follower 
> delays its fetching from the leader, there could be under MinISR partitions 
> in the cluster, causing unavailability for clients.
> This ticket is created to track speedup in the processing of the LeaderAndIsr 
> request.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to