[
https://issues.apache.org/jira/browse/ZOOKEEPER-1457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13264585#comment-13264585
]
Camille Fournier commented on ZOOKEEPER-1457:
---------------------------------------------
Hi Neha,
I'm trying to dig into this issue a bit, but I'm a little unclear of what the
problem is. You have a node that was created by session C. When Session C
expires, the node is deleted. You don't expect it to be deleted, but I'm not
sure why. Is it because you didn't know about session C? Is it because session
B didn't get the right information about the node?
What does this mean in the context of the bug?
{quote}Since the leader processes create session and create znode for Session C
first, shouldn't it be the session id that gets returned to the client as
create session response ? Does this sound like a bug ?{quote}
Session C seems to be the owner of the node, and you've got a closeSession for
C, so is it really deleting the node for an unexpired session?
> Ephemeral node deleted for unexpired sessions
> ---------------------------------------------
>
> Key: ZOOKEEPER-1457
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1457
> Project: ZooKeeper
> Issue Type: Bug
> Affects Versions: 3.3.4
> Reporter: Neha Narkhede
>
> This week, we saw a potential bug with zookeeper 3.3.4. In an attempt to
> adding a separate disk for zookeeper transaction logs, our SysOps team threw
> new disks at all the zookeeper servers in our production cluster at around
> the same time. Right after this, we saw degraded performance on our zookeeper
> cluster. And yes, I agree that this degraded behavior is expected and we
> could've done a better job and upgraded one server at a time. Al though, the
> observed impact was that ephemeral nodes got deleted without session
> expiration on the zookeeper clients.
> Let me try and describe what I've observed from the Kafka and ZK server logs
> - Kafka client has a session established with ZK, say Session A, that it has
> been using successfully. At the time of the degraded ZK performance issue,
> Session A expires. Kafka's ZkClient tries to establish another session with
> ZK. After 9 seconds, it establishes a session, say Session B and tries to use
> it for creating a znode. This operation fails with a NodeExists error since
> another session, say session C, has created that znode. This is considered OK
> since ZkClient retries an operation transparently if it gets disconnected and
> sometimes you can get NodeExists. But then later, session C expires and hence
> the ephemeral node is deleted from ZK. This leads to unexpected errors in
> Kafka since its session, Session B, is still valid and hence it expects the
> znode to be there. The issue is that session C was established, created the
> znode and expired, without the zookeeper client on Kafka ever knowing about
> it.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira