[jira] [Commented] (KAFKA-5323) AdminUtils.createTopic should check topic existence upfront

ASF GitHub Bot (JIRA) Thu, 25 May 2017 11:16:16 -0700

    [ 
https://issues.apache.org/jira/browse/KAFKA-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025147#comment-16025147
 ]


ASF GitHub Bot commented on KAFKA-5323:
---------------------------------------

GitHub user onurkaraman opened a pull request:

    https://github.com/apache/kafka/pull/3144

    KAFKA-5323: AdminUtils.createTopic should check topic existence upfront

    When a topic exists, AdminUtils.createTopic unnecessarily does N+2 
zookeeper reads where N is the number of brokers. Here is the breakdown of the 
N+2 zookeeper reads:
    1. reads the current list of brokers in zookeeper (1 zookeeper read)
    2. reads metadata for each broker in zookeeper (N zookeeper reads where N 
is the number of brokers)
    3. checks for topic existence in zookeeper (1 zookeeper read)
    
    This can have a larger impact than one might initially suspect. For 
instance, a broker only populates its MetadataCache after it has joined the 
cluster and the controller sends it an UpdateMetadataRequest. But a broker can 
begin processing requests even before registering itself in zookeeper (before 
the controller even knows the broker is alive). In other words, a broker can 
begin processing MetadataRequests before processing the controller's 
UpdateMetadataRequest following broker registration.
    
    Processing these MetadataRequests in this scenario leads to large local 
times and can cause substantial request queue backup, causing significant 
delays in the broker processing its initial UpdateMetadataRequest. Since the 
broker hasn't received any UpdateMetadataRequest from the controller yet, its 
MetadataCache is empty. So the topics from all the client MetadataRequests are 
treated as brand new topics, which means the broker tries to auto create these 
topics. For each pre-existing topic queried in the MetadataRequest, auto topic 
creation performs the N+2 zookeeper reads mentioned earlier.
    
    In one bad production scenario (while recovering from KAFKA-4959), this 
caused a significant delay in bringing replicas online, as both the initial 
LeaderAndIsrRequest and UpdateMetadataRequest from the controller on broker 
startup was stuck behind these client MetadataRequests hammering zookeeper.
    
    We can reduce the N+2 reads down to 1 by checking topic existence upfront.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/onurkaraman/kafka KAFKA-5323

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/kafka/pull/3144.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3144
    
----
commit 90dc1c45db30d579d04529ba6edfead8e198e762
Author: Onur Karaman <[email protected]>
Date:   2017-05-25T18:03:48Z

    KAFKA-5323: AdminUtils.createTopic should check topic existence upfront
    
    When a topic exists, AdminUtils.createTopic unnecessarily does N+2 
zookeeper reads where N is the number of brokers. Here is the breakdown of the 
N+2 zookeeper reads:
    1. reads the current list of brokers in zookeeper (1 zookeeper read)
    2. reads metadata for each broker in zookeeper (N zookeeper reads where N 
is the number of brokers)
    3. checks for topic existence in zookeeper (1 zookeeper read)
    
    This can have a larger impact than one might initially suspect. For 
instance, a broker only populates its MetadataCache after it has joined the 
cluster and the controller sends it an UpdateMetadataRequest. But a broker can 
begin processing requests even before registering itself in zookeeper (before 
the controller even knows the broker is alive). In other words, a broker can 
begin processing MetadataRequests before processing the controller's 
UpdateMetadataRequest following broker registration.
    
    Processing these MetadataRequests in this scenario leads to large local 
times and can cause substantial request queue backup, causing significant 
delays in the broker processing its initial UpdateMetadataRequest. Since the 
broker hasn't received any UpdateMetadataRequest from the controller yet, its 
MetadataCache is empty. So the topics from all the client MetadataRequests are 
treated as brand new topics, which means the broker tries to auto create these 
topics. For each pre-existing topic queried in the MetadataRequest, auto topic 
creation performs the N+2 zookeeper reads mentioned earlier.
    
    In one bad production scenario (while recovering from KAFKA-4959), this 
caused a significant delay in bringing replicas online, as both the initial 
LeaderAndIsrRequest and UpdateMetadataRequest from the controller on broker 
startup was stuck behind these client MetadataRequests hammering zookeeper.
    
    We can reduce the N+2 reads down to 1 by checking topic existence upfront.

----


> AdminUtils.createTopic should check topic existence upfront
> -----------------------------------------------------------
>
>                 Key: KAFKA-5323
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5323
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Onur Karaman
>            Assignee: Onur Karaman
>
> When a topic exists, AdminUtils.createTopic unnecessarily does N+2 zookeeper 
> reads where N is the number of brokers. Here is the breakdown of the N+2 
> zookeeper reads:
> # reads the current list of brokers in zookeeper (1 zookeeper read)
> # reads metadata for each broker in zookeeper (N zookeeper reads where N is 
> the number of brokers)
> # checks for topic existence in zookeeper (1 zookeeper read)
> We can reduce the N+2 reads down to 1 by checking topic existence upfront.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (KAFKA-5323) AdminUtils.createTopic should check topic existence upfront

Reply via email to