[ https://issues.apache.org/jira/browse/KAFKA-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16185434#comment-16185434 ]
Prasanna Gautam commented on KAFKA-5473: ---------------------------------------- [~ijuma] I added a new configuration that's consistent with [~junrao] was mentioning previously. I have added zookeeper.connection.retry.timeout.ms to set an upper bound on how long to wait before killing the connection and triggering the shutdown. This is looking like a bigger structure change than I'd originally anticipated. I want to make sure I'm on right track. Since ZkUtils is initialized and needs to be closed/reconnected in ZKServer object, does it make sense to pass state of connection to the KafkaServer so that timeout can be guaranteed and the services cleanly shut down. This is different than other examples in the codebase where ZK is used to share state, but since this involves ZK not being available, etc, we need a different mechanism to inform KafkaServer that it needs to start reconnect, then use the ZKUtils instance thereafter. if the reconnect retry timeout has reached, then start shutdown process. The IZkStateListener is used in multiple places in code, and I think it's easier to make another class like ZKSessionTimeoutRecovery that only handles reconnects, and clean exit if that fails. > handle ZK session expiration properly when a new session can't be established > ----------------------------------------------------------------------------- > > Key: KAFKA-5473 > URL: https://issues.apache.org/jira/browse/KAFKA-5473 > Project: Kafka > Issue Type: Sub-task > Affects Versions: 0.9.0.0 > Reporter: Jun Rao > Assignee: Prasanna Gautam > Fix For: 1.0.0 > > > In https://issues.apache.org/jira/browse/KAFKA-2405, we change the logic in > handling ZK session expiration a bit. If a new ZK session can't be > established after session expiration, we just log an error and continue. > However, this can leave the broker in a bad state since it's up, but not > registered from the controller's perspective. Replicas on this broker may > never to be in sync. -- This message was sent by Atlassian JIRA (v6.4.14#64029)