[ 
https://issues.apache.org/jira/browse/KAFKA-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13673651#comment-13673651
 ] 

Jun Rao commented on KAFKA-927:
-------------------------------

Thanks for patch v3. A few more comments:

30. KafkaServer:
30.1 Could you combine isShuttingDown and startupComplete?
30.2 In controlledShutdown(), it's not clear if it's worth caching the socket 
channel. Technically, it's possible for a controller to come back on the broker 
with the same id, but with a different broker host/port. It's simpler to just 
always close the socket channel on each ControlledShutdownRequest and create a 
new channel on retry.

31. KafkaController:
31.1 remove unused import java.util.concurrent.{Semaphore
31.2 I think we still need to set shuttingDownBrokerIds to empty in 
onControllerFailover(). A controller may failover during a controlled shutdown 
and later regain the controllership. OnBrokerFailure() is only called if the 
controller is active. So shuttingDownBrokerIds may not be empty when the 
controllership switches back.
                
> Integrate controlled shutdown into kafka shutdown hook
> ------------------------------------------------------
>
>                 Key: KAFKA-927
>                 URL: https://issues.apache.org/jira/browse/KAFKA-927
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Sriram Subramanian
>            Assignee: Sriram Subramanian
>         Attachments: KAFKA-927.patch, KAFKA-927-v2.patch, 
> KAFKA-927-v2-revised.patch, KAFKA-927-v3.patch
>
>
> The controlled shutdown mechanism should be integrated into the software for 
> better operational benefits. Also few optimizations can be done to reduce 
> unnecessary rpc and zk calls. This patch has been tested on a prod like 
> environment by doing rolling bounces continuously for a day. The average time 
> of doing a rolling bounce with controlled shutdown for a cluster with 7 nodes 
> without this patch is 340 seconds. With this patch it reduces to 220 seconds. 
> Also it ensures correctness in scenarios where the controller shrinks the isr 
> and the new leader could place the broker to be shutdown back into the isr.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to