[
https://issues.apache.org/jira/browse/ZOOKEEPER-1457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13262902#comment-13262902
]
Neha Narkhede commented on ZOOKEEPER-1457:
------------------------------------------
Logs are uploaded here -
http://people.apache.org/%7Enehanarkhede/kafka-misc/zookeeper-outage-2012-04-23/kafka-zk-bug-2012-04-23.tar.gz
> Ephemeral node deleted for unexpired sessions
> ---------------------------------------------
>
> Key: ZOOKEEPER-1457
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1457
> Project: ZooKeeper
> Issue Type: Bug
> Affects Versions: 3.3.4
> Reporter: Neha Narkhede
>
> This week, we saw a potential bug with zookeeper 3.3.4. In an attempt to
> adding a separate disk for zookeeper transaction logs, our SysOps team threw
> new disks at all the zookeeper servers in our production cluster at around
> the same time. Right after this, we saw degraded performance on our zookeeper
> cluster. And yes, I agree that this degraded behavior is expected and we
> could've done a better job and upgraded one server at a time. Al though, the
> observed impact was that ephemeral nodes got deleted without session
> expiration on the zookeeper clients.
> Let me try and describe what I've observed from the Kafka and ZK server logs
> - Kafka client has a session established with ZK, say Session A, that it has
> been using successfully. At the time of the degraded ZK performance issue,
> Session A expires. Kafka's ZkClient tries to establish another session with
> ZK. After 9 seconds, it establishes a session, say Session B and tries to use
> it for creating a znode. This operation fails with a NodeExists error since
> another session, say session C, has created that znode. This is considered OK
> since ZkClient retries an operation transparently if it gets disconnected and
> sometimes you can get NodeExists. But then later, session C expires and hence
> the ephemeral node is deleted from ZK. This leads to unexpected errors in
> Kafka since its session, Session B, is still valid and hence it expects the
> znode to be there. The issue is that session C was established, created the
> znode and expired, without the zookeeper client on Kafka ever knowing about
> it.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira