$$$-×"- Sent from my Verizon Wireless 4G LTE smartphoneSax@-sشص
-------- Original message -------- From: "Jungtaek Lim (JIRA)" <[email protected]> Date:2016/07/05 4:41 PM (GMT+08:00) To: [email protected] Subject: [jira] [Resolved] (STORM-1941) Nimbus discovery can fail when zookeeper reconnect happens. [ https://issues.apache.org/jira/browse/STORM-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim resolved STORM-1941. --------------------------------- Resolution: Fixed Fix Version/s: 1.1.0 1.0.2 2.0.0 Merged to master, 1.x-branch by Harsha, and 1.0.x-branch by me. > Nimbus discovery can fail when zookeeper reconnect happens. > ----------------------------------------------------------- > > Key: STORM-1941 > URL: https://issues.apache.org/jira/browse/STORM-1941 > Project: Apache Storm > Issue Type: Bug > Components: storm-core > Affects Versions: 1.0.0, 1.0.1 > Reporter: Jungtaek Lim > Assignee: Jungtaek Lim > Priority: Critical > Fix For: 2.0.0, 1.0.2, 1.1.0 > > > When zookeeper reconnect happens, nimbus registry can be deleted though > nimbus is alive. > Below is zookeeper node for nimbus registry. > {code} > get /storm/nimbuses/<host>:6627 > ?f`d``??????M?-?-.?/??5??/H?+.IL???ON??``b`?|???^^??????? > ?'h?g?g?g?g > t-?,[??Q > cZxid = 0x4000005ae > ctime = Fri Jul 01 11:43:51 UTC 2016 > mZxid = 0x4000005ae > mtime = Fri Jul 01 11:43:51 UTC 2016 > pZxid = 0x4000005ae > cversion = 0 > dataVersion = 0 > aclVersion = 0 > ephemeralOwner = 0x255a62e310c0005 > dataLength = 98 > numChildren = 0 > {code} > {code} > get /storm/nimbuses/<host>:6627 > ?f`d``??????M?-?-.?/??5??/H?+.IL???ON??``b`?|???^^??????? > ?'h?g?g?g?g > t-?,[??Q > cZxid = 0x4000005ae > ctime = Fri Jul 01 11:43:51 UTC 2016 > mZxid = 0x50000000e > mtime = Fri Jul 01 11:46:08 UTC 2016 > pZxid = 0x4000005ae > cversion = 0 > dataVersion = 1 > aclVersion = 0 > ephemeralOwner = 0x255a62e310c0005 > dataLength = 98 > numChildren = 0 > {code} > Below is transaction log for that node. > {code} > 7/1/16 11:43:51 AM UTC session 0x255a62e310c0005 cxid 0xd zxid 0x4000005ae > create > '/storm/nimbuses/<host>:6627,#1fffffff8b80000000ffffffe36660646060ffffff90ffffffcfffffffcaffffffc9ffffffccffffffd54dffffffcc2dffffffd62d2effffffc92fffffffcaffffffd535ffffffd2ffffffcb2f48ffffffcd2b2e494cffffffceffffffceffffffc94f4effffffccffffffe160606260ffffff907cffffffccffffffc1ffffffc01c5e165effffffceffffffc4ffffffc0ffffffc2ffffffc0ffffffcdffffffc0affffffd42768ffffffa867ffffffa067ffffffa867ffffffa467affffffa4d742dffffff8c2c1805b14ffffffc2ffffffaf51000,v{s{31,s{'world,'anyone}}},T,10 > 7/1/16 11:46:08 AM UTC session 0x355a647bd8c0000 cxid 0x3 zxid 0x50000000e > setData > '/storm/nimbuses/<host>:6627,#1fffffff8b80000000ffffffe36660646060ffffff90ffffffcfffffffcaffffffc9ffffffccffffffd54dffffffcc2dffffffd62d2effffffc92fffffffcaffffffd535ffffffd2ffffffcb2f48ffffffcd2b2e494cffffffceffffffceffffffc94f4effffffccffffffe160606260ffffff907cffffffccffffffc1ffffffc01c5e165effffffceffffffc4ffffffc0ffffffc2ffffffc0ffffffcdffffffc0affffffd42768ffffffa867ffffffa067ffffffa867ffffffa467affffffa4d742dffffff8c2c1805b14ffffffc2ffffffaf51000,1 > {code} > Please take a look at ctime, mtime, and ephemeralOwner. > Ephemeral owner session was already closed from nimbus side but there's > possible for node to be not deleted immediately, so new session doesn't > create new node but set the value to ephemeral node for other session which > is already closed. > *And eventually that node is deleted although session 0x355a647bd8c0000 is > alive.* > {code} > 2016-07-01 11:45:05.675 o.a.s.s.o.a.z.ClientCnxn [DEBUG] Disconnecting client > for session: 0x255a62e310c0005 > 2016-07-01 11:45:05.675 o.a.s.s.o.a.z.ZooKeeper [INFO] Session: > 0x255a62e310c0005 closed > {code} > We can delete the node first and set ephemeral node when reconnect event > handler is called. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
