Neha Narkhede created ZOOKEEPER-1457:
----------------------------------------
Summary: Ephemeral node deleted for unexpired sessions
Key: ZOOKEEPER-1457
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1457
Project: ZooKeeper
Issue Type: Bug
Affects Versions: 3.3.4
Reporter: Neha Narkhede
This week, we saw a potential bug with zookeeper 3.3.4. In an attempt to adding
a separate disk for zookeeper transaction logs, our SysOps team threw new disks
at all the zookeeper servers in our production cluster at around the same time.
Right after this, we saw degraded performance on our zookeeper cluster. And
yes, I agree that this degraded behavior is expected and we could've done a
better job and upgraded one server at a time. Al though, the observed impact
was that ephemeral nodes got deleted without session expiration on the
zookeeper clients.
Let me try and describe what I've observed from the Kafka and ZK server logs -
Kafka client has a session established with ZK, say Session A, that it has been
using successfully. At the time of the degraded ZK performance issue, Session A
expires. Kafka's ZkClient tries to establish another session with ZK. After 9
seconds, it establishes a session, say Session B and tries to use it for
creating a znode. This operation fails with a NodeExists error since another
session, say session C, has created that znode. This is considered OK since
ZkClient retries an operation transparently if it gets disconnected and
sometimes you can get NodeExists. But then later, session C expires and hence
the ephemeral node is deleted from ZK. This leads to unexpected errors in Kafka
since its session, Session B, is still valid and hence it expects the znode to
be there. The issue is that session C was established, created the znode and
expired, without the zookeeper client on Kafka ever knowing about it.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira