[ 
https://issues.apache.org/jira/browse/KAFKA-3436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15213709#comment-15213709
 ] 

ASF GitHub Bot commented on KAFKA-3436:
---------------------------------------

GitHub user becketqin opened a pull request:

    https://github.com/apache/kafka/pull/1149

    KAFKA-3436: Speed up controlled shutdown.

    This patch does the followings:
    1. Batched LeaderAndIsrRequest and UpdateMetadataRequest during controlled 
shutdown.
    2. Added async read and write method to an extending ZkClient. Used the 
async zk operation for LeaderAndIsr read and update. The async method can be 
used in other places as well (e.g. preferred leader election, replica 
reassignment, controller bootstrap, etc), but those are out of the scope of 
this ticket.
    
    Conducted some rolling boucne test, a controlled shutdown involving 2500 
partitions takes around 3 seconds now. Previously it can takes more than 30 
seconds.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/becketqin/kafka KAFKA-3436

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/kafka/pull/1149.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1149
    
----
commit c2d22821c6c3ad7aa45090def6b984719209f5af
Author: Jiangjie Qin <becket....@gmail.com>
Date:   2016-03-27T21:29:30Z

    KAFKA-3436: Speed up controlled shutdown

commit 7e7cf3fb1fc4a44d7af4ea935b38bf2e90e6cadd
Author: Jiangjie Qin <becket....@gmail.com>
Date:   2016-03-28T00:47:22Z

    Remove pre-sent StopReplicaRequests and split state transition into 
multiple groups.

----


> Speed up controlled shutdown.
> -----------------------------
>
>                 Key: KAFKA-3436
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3436
>             Project: Kafka
>          Issue Type: Improvement
>    Affects Versions: 0.9.0.0
>            Reporter: Jiangjie Qin
>            Assignee: Jiangjie Qin
>             Fix For: 0.10.1.0
>
>
> Currently rolling bounce a Kafka cluster with tens of thousands of partitions 
> can take very long (~2 min for each broker with ~5000 partitions/broker in 
> our environment). The majority of the time is spent on shutting down a 
> broker. The time of shutting down a broker usually  includes the following 
> parts:
> T1: During controlled shutdown, people usually want to make sure there is no 
> under replicated partitions. So shutting down a broker during a rolling 
> bounce will have to wait for the previous restarted broker to catch up. This 
> is T1.
> T2: The time to send controlled shutdown request and receive controlled 
> shutdown response. Currently the a controlled shutdown request will trigger 
> many LeaderAndIsrRequest and UpdateMetadataRequest. And also involving many 
> zookeeper update in serial.
> T3: The actual time to shutdown all the components. It is usually small 
> compared with T1 and T2.
> T1 is related to:
> A) the inbound throughput on the cluster, and 
> B) the "down" time of the broker (time between replica fetchers stop and 
> replica fetchers restart)
> The larger the traffic is, or the longer the broker stopped fetching, the 
> longer it will take for the broker to catch up and get back into ISR. 
> Therefore the longer T1 will be. Assume:
> * the in bound network traffic is X bytes/second on a broker
> * the time T1.B ("down" time) mentioned above is T
> Theoretically it will take (X * T) / (NetworkBandwidth - X) = 
> InBoundNetworkUtilization * T / (1 - InboundNetworkUtilization) for a the 
> broker to catch up after the restart. While X is out of our control, T is 
> largely related to T2.
> The purpose of this ticket is to reduce T2 by:
> 1. Batching the LeaderAndIsrRequest and UpdateMetadataRequest during 
> controlled shutdown.
> 2. Use async zookeeper write to pipeline zookeeper writes. According to 
> Zookeeper wiki(https://wiki.apache.org/hadoop/ZooKeeper/Performance), a 3 
> node ZK cluster should be able to handle 20K writes (1K size). So if we use 
> async write, likely we will be able to reduce zookeeper update time to lower 
> seconds or even sub-second level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to