[jira] [Commented] (KAFKA-927) Integrate controlled shutdown into kafka shutdown hook

Neha Narkhede (JIRA) Mon, 03 Jun 2013 11:10:48 -0700

    [ 
https://issues.apache.org/jira/browse/KAFKA-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13673380#comment-13673380
 ]


Neha Narkhede commented on KAFKA-927:
-------------------------------------

Thanks for the revised v2 patch. Few more comments -

1. KafkaServer
1.1 startupComplete should either be a volatile variable to AtomicBoolean. Two 
different threads call startup() and controlledShutdown(), which modify 
startupComplete.
1.2 In controlledShutdown(), we need to handle error codes in 
ControlledShutdownResponse explicitly. It can happen that the error code is set 
and partitionsRemaining are 0, which will lead to errors.

2. Partition

>From previous review #4, if the broker has to ignore the become follower 
>request anyway, does it make sense to even process part of it and truncate log 
>etc ?

3. From previous review #3, I meant that it is pointless to do the ZK write on 
the controller since right after the write, since the follower hasn't received 
the stop replica request and the leader hasn't received shrunk isr, the broker 
being shut down will get added back to ISR. You can verify that this happens 
from the logs. It also makes controlled shutdown very slow since typically in 
production we move ~1000 partitions from the broker and zk writes can take 
~20ms which means several seconds wasted just doing the ZK writes. Instead, it 
is enough to let the leader shrink the isr by sending it the leader and isr 
request. On the other hand, we can argue that the OfflineReplica state change 
itself should be changed to avoid the ZK write. But that is a bigger change, so 
we should avoid that right now.
                
> Integrate controlled shutdown into kafka shutdown hook
> ------------------------------------------------------
>
>                 Key: KAFKA-927
>                 URL: https://issues.apache.org/jira/browse/KAFKA-927
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Sriram Subramanian
>            Assignee: Sriram Subramanian
>         Attachments: KAFKA-927.patch, KAFKA-927-v2.patch, 
> KAFKA-927-v2-revised.patch
>
>
> The controlled shutdown mechanism should be integrated into the software for 
> better operational benefits. Also few optimizations can be done to reduce 
> unnecessary rpc and zk calls. This patch has been tested on a prod like 
> environment by doing rolling bounces continuously for a day. The average time 
> of doing a rolling bounce with controlled shutdown for a cluster with 7 nodes 
> without this patch is 340 seconds. With this patch it reduces to 220 seconds. 
> Also it ensures correctness in scenarios where the controller shrinks the isr 
> and the new leader could place the broker to be shutdown back into the isr.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (KAFKA-927) Integrate controlled shutdown into kafka shutdown hook

Reply via email to