[
https://issues.apache.org/jira/browse/ZOOKEEPER-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jun Rao updated ZOOKEEPER-2919:
-------------------------------
Attachment: zk-myid-5.log
zk-myid-4.log
Attached are the logs from 2 ZK servers.
> expired ephemeral node reappears after ZK leader change
> -------------------------------------------------------
>
> Key: ZOOKEEPER-2919
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2919
> Project: ZooKeeper
> Issue Type: Bug
> Affects Versions: 3.4.9
> Reporter: Jun Rao
> Attachments: zk-myid-4.log, zk-myid-5.log
>
>
> We found the following issue when using ZK. A client (a Kafka broker)
> registered an ephemeral node in ZK. The client then received a session
> expiration event and created the new session. The client tried to create the
> same ephemeral node in ZK in the new session but received a
> NodeExistException. The following are the details.
> From Kafka broker 1:
> Broker 1 received the expiration of session 55bcff0f02d0002 at 13:33:26.
> {code:java}
> [2017-07-29 13:33:26,706] INFO Unable to reconnect to ZooKeeper service,
> session 0x55bcff0f02d0002 has expired, closing socket connection
> (org.apache.zookeeper.ClientCnxn)
> {code}
> It then established a new session 55d8f690ca20038 at 13:33:33.
> {code:java}
> [2017-07-29 13:33:33,405] INFO Session establishment complete on server
> rdalnydbbdqs10/10.122.104.12:2181, sessionid = 0x55d8f690ca20038, negotiated
> timeout = 6000 (org.apache.zookeeper.ClientCnxn)
> {code}
> However, the re-registration of the broker id fails.
> {code:java}
> [2017-07-29 13:33:33,408] INFO Result of znode creation is: NODEEXISTS
> (kafka.utils.ZKCheckedEphemeral)
> [2017-07-29 13:33:33,408] ERROR Error handling event ZkEvent[New session
> event sent to kafka.server.KafkaHealthcheck$SessionExpireListener@74ad6d14]
> (org.I0Itec.zkclient.ZkEvent
> Thread)
> java.lang.RuntimeException: A broker is already registered on the path
> /brokers/ids/1. This probably indicates that you either have configured a
> brokerid that is already in use, or else you have shutdown this broker and
> restarted it faster than the zookeeper timeout so it appears to be
> re-registering.
> at kafka.utils.ZkUtils.registerBrokerInZk(ZkUtils.scala:375)
> at kafka.utils.ZkUtils.registerBrokerInZk(ZkUtils.scala:361)
> at kafka.server.KafkaHealthcheck.register(KafkaHealthcheck.scala:71)
> at
> kafka.server.KafkaHealthcheck$SessionExpireListener.handleNewSession(KafkaHealthcheck.scala:105)
> at org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:736)
> at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:72)
> {code}
> From ZK server (my id 4) :
> It expired the old session 55bcff0f02d0002 correctly before broker received
> the session expiration. It then went to ZK leader election soon after.
> {code:java}
> [2017-07-29 13:33:26,000] INFO Expiring session 0x55bcff0f02d0002, timeout of
> 6000ms exceeded (org.apache.zookeeper.server.ZooKeeperServer)
> [2017-07-29 13:33:26,019] INFO Processed session termination for sessionid:
> 0x55bcff0f02d0002 (org.apache.zookeeper.server.PrepRequestProcessor)
> [2017-07-29 13:33:33,582] INFO Shutting down
> (org.apache.zookeeper.server.quorum.CommitProcessor)
> [2017-07-29 13:33:34,344] INFO New election. My id = 4, proposed
> zxid=0x5830d1163b (org.apache.zookeeper.server.quorum.FastLeaderElection)
> [2017-07-29 13:34:22,499] INFO FOLLOWING - LEADER ELECTION TOOK - 48915
> (org.apache.zookeeper.server.quorum.Learner)
> {code}
> From ZK server (my id 5):
> It lost the connection to the old session 55bcff0f02d0002 before the session
> got expired. It then went into ZK leader election and became the leader.
> However, it didn't think the old session 55bcff0f02d0002 was expired after
> becoming the leader. Therefore, the new session 55d8f690ca20038 failed to
> create /brokers/ids/1. Only after that, it eventually expired the old session
> 55bcff0f02d0002.
> {code:java}
> [2017-07-29 13:33:24,216] WARN caught end of stream exception
> (org.apache.zookeeper.server.NIOServerCnxn)
> EndOfStreamException: Unable to read additional data from client sessionid
> 0x55bcff0f02d0002, likely client has closed socket
> at
> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
> at
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)
> at java.lang.Thread.run(Thread.java:745)
> 2017-07-29 13:33:24,216] INFO Closed socket connection for client
> /10.122.73.147:59615 which had sessionid 0x55bcff0f02d0002
> (org.apache.zookeeper.server.NIOServerCnxn)
> [2017-07-29 13:33:30,921] INFO New election. My id = 5, proposed
> zxid=0x5830d1113f (org.apache.zookeeper.server.quorum.FastLeaderElection)
> [2017-07-29 13:33:31,126] INFO LEADING - LEADER ELECTION TOOK - 1122
> (org.apache.zookeeper.server.quorum.Leader)
> [2017-07-29 13:33:33,405] INFO Established session 0x55d8f690ca20038 with
> negotiated timeout 6000 for client /10.122.73.147:47106
> (org.apache.zookeeper.server.ZooKeeperServer)
> [2017-07-29 13:33:33,407] INFO Got user-level KeeperException when processing
> sessionid:0x55d8f690ca20038 type:create cxid:0x5 zxid:0x5900000352 txntype:-1
> reqpath:n/a Error Path:/brokers/ids/1 Error:KeeperErrorCode = NodeExists for
> /brokers/ids/1 (org.apache.zookeeper.server.PrepRequestProcessor)
> [2017-07-29 13:33:40,002] INFO Expiring session 0x55bcff0f02d0002, timeout of
> 6000ms exceeded (org.apache.zookeeper.server.ZooKeeperServer)
> [2017-07-29 13:33:40,074] INFO Processed session termination for sessionid:
> 0x55bcff0f02d0002 (org.apache.zookeeper.server.PrepRequestProcessor)
> {code}
> According to
> http://mail-archives.apache.org/mod_mbox/zookeeper-user/201701.mbox/%3CB512F6DE-C0BF-45CE-8102-6F242988268E%40apache.org%3E
> from [~fpj], a ZK client in a new session shouldn't see the ephemeral node
> created in its previous session. So, could this be a potential bug in ZK
> during ZK leader transition?
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)