The behavior we saw on one of our zookeeper clients is as follows. The session expires on the client, it assumes the ephemeral nodes are deleted, so it establishes a new session with zookeeper and tries to re-create the ephemeral nodes. However, when it tries to re-create the ephemeral node, zookeeper throws back a NodeExists error code. Now this is legitimate during a session disconnect event (since zkclient automatically retries the operation and raises a NodeExists error). Also by design, Kafka doesn't have multiple clients create the same ephemeral node, so Kafka server assumes the NodeExists is normal. However, after a few seconds zookeeper deletes that ephemeral node. So from the client's perspective, even though the client has a new valid session, its ephemeral node is gone.
After poking at the transaction and log4j logs, I saw that the NodeExists was because the zookeeper leader had retained the ephemeral node from the previous expired session. It turns out that it notified the client of the session expiration before actually deleting the ephemeral node. It is also worth noting that the previous session was expired due to a long fsync operation on the zookeeper leader. After it returned from the fsync, it had a whole bunch of sessions to expire. In this case, it seems that zookeeper should not notify the client that the session is expired until the ephemeral node information is actually gone. Or maybe I'm not clear on what the guarantees from zookeeper are, across sessions from the same client. Thanks, Neha
