[ 
https://issues.apache.org/jira/browse/KAFKA-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16185434#comment-16185434
 ] 

Prasanna Gautam commented on KAFKA-5473:
----------------------------------------

[~ijuma] I added a new configuration that's consistent with [~junrao] was 
mentioning previously. I have added zookeeper.connection.retry.timeout.ms to 
set an upper bound on how long to wait before killing the connection and 
triggering the shutdown. This is looking like a bigger structure change than 
I'd originally anticipated. I want to make sure I'm on right track. Since 
ZkUtils is initialized and needs to be closed/reconnected in ZKServer object, 
does it make sense to pass state of connection to the KafkaServer so that 
timeout can be guaranteed and the services cleanly shut down.  
This is different than other examples in the codebase where ZK is used to share 
state, but since this involves ZK not being available, etc, we need a different 
mechanism to inform KafkaServer that it needs to start reconnect, then use the 
ZKUtils instance thereafter. if the reconnect retry timeout has reached, then 
start shutdown process. The IZkStateListener is used in multiple places in 
code, and I think it's easier to make another class like 
ZKSessionTimeoutRecovery that only handles reconnects, and clean exit if that 
fails. 

> handle ZK session expiration properly when a new session can't be established
> -----------------------------------------------------------------------------
>
>                 Key: KAFKA-5473
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5473
>             Project: Kafka
>          Issue Type: Sub-task
>    Affects Versions: 0.9.0.0
>            Reporter: Jun Rao
>            Assignee: Prasanna Gautam
>             Fix For: 1.0.0
>
>
> In https://issues.apache.org/jira/browse/KAFKA-2405, we change the logic in 
> handling ZK session expiration a bit. If a new ZK session can't be 
> established after session expiration, we just log an error and continue. 
> However, this can leave the broker in a bad state since it's up, but not 
> registered from the controller's perspective. Replicas on this broker may 
> never to be in sync.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to