[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13262902#comment-13262902
 ] 

Neha Narkhede commented on ZOOKEEPER-1457:
------------------------------------------

Logs are uploaded here - 
http://people.apache.org/%7Enehanarkhede/kafka-misc/zookeeper-outage-2012-04-23/kafka-zk-bug-2012-04-23.tar.gz
                
> Ephemeral node deleted for unexpired sessions
> ---------------------------------------------
>
>                 Key: ZOOKEEPER-1457
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1457
>             Project: ZooKeeper
>          Issue Type: Bug
>    Affects Versions: 3.3.4
>            Reporter: Neha Narkhede
>
> This week, we saw a potential bug with zookeeper 3.3.4. In an attempt to 
> adding a separate disk for zookeeper transaction logs, our SysOps team threw 
> new disks at all the zookeeper servers in our production cluster at around 
> the same time. Right after this, we saw degraded performance on our zookeeper 
> cluster. And yes, I agree that this degraded behavior is expected and we 
> could've done a better job and upgraded one server at a time. Al though, the 
> observed impact was that ephemeral nodes got deleted without session 
> expiration on the zookeeper clients. 
> Let me try and describe what I've observed from the Kafka and ZK server logs 
> - Kafka client has a session established with ZK, say Session A, that it has 
> been using successfully. At the time of the degraded ZK performance issue, 
> Session A expires. Kafka's ZkClient tries to establish another session with 
> ZK. After 9 seconds, it establishes a session, say Session B and tries to use 
> it for creating a znode. This operation fails with a NodeExists error since 
> another session, say session C, has created that znode. This is considered OK 
> since ZkClient retries an operation transparently if it gets disconnected and 
> sometimes you can get NodeExists. But then later, session C expires and hence 
> the ephemeral node is deleted from ZK. This leads to unexpected errors in 
> Kafka since its session, Session B, is still valid and hence it expects the 
> znode to be there. The issue is that session C was established, created the 
> znode and expired, without the zookeeper client on Kafka ever knowing about 
> it. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to