[
https://issues.apache.org/jira/browse/KAFKA-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025147#comment-16025147
]
ASF GitHub Bot commented on KAFKA-5323:
---------------------------------------
GitHub user onurkaraman opened a pull request:
https://github.com/apache/kafka/pull/3144
KAFKA-5323: AdminUtils.createTopic should check topic existence upfront
When a topic exists, AdminUtils.createTopic unnecessarily does N+2
zookeeper reads where N is the number of brokers. Here is the breakdown of the
N+2 zookeeper reads:
1. reads the current list of brokers in zookeeper (1 zookeeper read)
2. reads metadata for each broker in zookeeper (N zookeeper reads where N
is the number of brokers)
3. checks for topic existence in zookeeper (1 zookeeper read)
This can have a larger impact than one might initially suspect. For
instance, a broker only populates its MetadataCache after it has joined the
cluster and the controller sends it an UpdateMetadataRequest. But a broker can
begin processing requests even before registering itself in zookeeper (before
the controller even knows the broker is alive). In other words, a broker can
begin processing MetadataRequests before processing the controller's
UpdateMetadataRequest following broker registration.
Processing these MetadataRequests in this scenario leads to large local
times and can cause substantial request queue backup, causing significant
delays in the broker processing its initial UpdateMetadataRequest. Since the
broker hasn't received any UpdateMetadataRequest from the controller yet, its
MetadataCache is empty. So the topics from all the client MetadataRequests are
treated as brand new topics, which means the broker tries to auto create these
topics. For each pre-existing topic queried in the MetadataRequest, auto topic
creation performs the N+2 zookeeper reads mentioned earlier.
In one bad production scenario (while recovering from KAFKA-4959), this
caused a significant delay in bringing replicas online, as both the initial
LeaderAndIsrRequest and UpdateMetadataRequest from the controller on broker
startup was stuck behind these client MetadataRequests hammering zookeeper.
We can reduce the N+2 reads down to 1 by checking topic existence upfront.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/onurkaraman/kafka KAFKA-5323
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/kafka/pull/3144.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3144
----
commit 90dc1c45db30d579d04529ba6edfead8e198e762
Author: Onur Karaman <[email protected]>
Date: 2017-05-25T18:03:48Z
KAFKA-5323: AdminUtils.createTopic should check topic existence upfront
When a topic exists, AdminUtils.createTopic unnecessarily does N+2
zookeeper reads where N is the number of brokers. Here is the breakdown of the
N+2 zookeeper reads:
1. reads the current list of brokers in zookeeper (1 zookeeper read)
2. reads metadata for each broker in zookeeper (N zookeeper reads where N
is the number of brokers)
3. checks for topic existence in zookeeper (1 zookeeper read)
This can have a larger impact than one might initially suspect. For
instance, a broker only populates its MetadataCache after it has joined the
cluster and the controller sends it an UpdateMetadataRequest. But a broker can
begin processing requests even before registering itself in zookeeper (before
the controller even knows the broker is alive). In other words, a broker can
begin processing MetadataRequests before processing the controller's
UpdateMetadataRequest following broker registration.
Processing these MetadataRequests in this scenario leads to large local
times and can cause substantial request queue backup, causing significant
delays in the broker processing its initial UpdateMetadataRequest. Since the
broker hasn't received any UpdateMetadataRequest from the controller yet, its
MetadataCache is empty. So the topics from all the client MetadataRequests are
treated as brand new topics, which means the broker tries to auto create these
topics. For each pre-existing topic queried in the MetadataRequest, auto topic
creation performs the N+2 zookeeper reads mentioned earlier.
In one bad production scenario (while recovering from KAFKA-4959), this
caused a significant delay in bringing replicas online, as both the initial
LeaderAndIsrRequest and UpdateMetadataRequest from the controller on broker
startup was stuck behind these client MetadataRequests hammering zookeeper.
We can reduce the N+2 reads down to 1 by checking topic existence upfront.
----
> AdminUtils.createTopic should check topic existence upfront
> -----------------------------------------------------------
>
> Key: KAFKA-5323
> URL: https://issues.apache.org/jira/browse/KAFKA-5323
> Project: Kafka
> Issue Type: Improvement
> Reporter: Onur Karaman
> Assignee: Onur Karaman
>
> When a topic exists, AdminUtils.createTopic unnecessarily does N+2 zookeeper
> reads where N is the number of brokers. Here is the breakdown of the N+2
> zookeeper reads:
> # reads the current list of brokers in zookeeper (1 zookeeper read)
> # reads metadata for each broker in zookeeper (N zookeeper reads where N is
> the number of brokers)
> # checks for topic existence in zookeeper (1 zookeeper read)
> We can reduce the N+2 reads down to 1 by checking topic existence upfront.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)