[
https://issues.apache.org/jira/browse/STORM-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jungtaek Lim updated STORM-1941:
--------------------------------
Description:
When zookeeper reconnect happens, nimbus registry can be deleted though nimbus
is alive.
Below is zookeeper node for nimbus registry.
{code}
get /storm/nimbuses/<host>:6627
?f`d``??????M?-?-.?/??5??/H?+.IL???ON??``b`?|???^^???????
?'h?g?g?g?g
t-?,[??Q
cZxid = 0x4000005ae
ctime = Fri Jul 01 11:43:51 UTC 2016
mZxid = 0x4000005ae
mtime = Fri Jul 01 11:43:51 UTC 2016
pZxid = 0x4000005ae
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x255a62e310c0005
dataLength = 98
numChildren = 0
{code}
{code}
get /storm/nimbuses/<host>:6627
?f`d``??????M?-?-.?/??5??/H?+.IL???ON??``b`?|???^^???????
?'h?g?g?g?g
t-?,[??Q
cZxid = 0x4000005ae
ctime = Fri Jul 01 11:43:51 UTC 2016
mZxid = 0x50000000e
mtime = Fri Jul 01 11:46:08 UTC 2016
pZxid = 0x4000005ae
cversion = 0
dataVersion = 1
aclVersion = 0
ephemeralOwner = 0x255a62e310c0005
dataLength = 98
numChildren = 0
{code}
Below is transaction log for that node.
{code}
7/1/16 11:43:51 AM UTC session 0x255a62e310c0005 cxid 0xd zxid 0x4000005ae
create
'/storm/nimbuses/<host>:6627,#1fffffff8b80000000ffffffe36660646060ffffff90ffffffcfffffffcaffffffc9ffffffccffffffd54dffffffcc2dffffffd62d2effffffc92fffffffcaffffffd535ffffffd2ffffffcb2f48ffffffcd2b2e494cffffffceffffffceffffffc94f4effffffccffffffe160606260ffffff907cffffffccffffffc1ffffffc01c5e165effffffceffffffc4ffffffc0ffffffc2ffffffc0ffffffcdffffffc0affffffd42768ffffffa867ffffffa067ffffffa867ffffffa467affffffa4d742dffffff8c2c1805b14ffffffc2ffffffaf51000,v{s{31,s{'world,'anyone}}},T,10
7/1/16 11:46:08 AM UTC session 0x355a647bd8c0000 cxid 0x3 zxid 0x50000000e
setData
'/storm/nimbuses/<host>:6627,#1fffffff8b80000000ffffffe36660646060ffffff90ffffffcfffffffcaffffffc9ffffffccffffffd54dffffffcc2dffffffd62d2effffffc92fffffffcaffffffd535ffffffd2ffffffcb2f48ffffffcd2b2e494cffffffceffffffceffffffc94f4effffffccffffffe160606260ffffff907cffffffccffffffc1ffffffc01c5e165effffffceffffffc4ffffffc0ffffffc2ffffffc0ffffffcdffffffc0affffffd42768ffffffa867ffffffa067ffffffa867ffffffa467affffffa4d742dffffff8c2c1805b14ffffffc2ffffffaf51000,1
{code}
Please take a look at ctime, mtime, and ephemeralOwner.
Ephemeral owner session was already closed from nimbus side but there's
possible for node to be not deleted immediately, so new session doesn't create
new node but set the value to ephemeral node for other session which is already
closed.
*And eventually that node is deleted although session 0x355a647bd8c0000 is
alive.*
{code}
2016-07-01 11:45:05.675 o.a.s.s.o.a.z.ClientCnxn [DEBUG] Disconnecting client
for session: 0x255a62e310c0005
2016-07-01 11:45:05.675 o.a.s.s.o.a.z.ZooKeeper [INFO] Session:
0x255a62e310c0005 closed
{code}
We can delete the node first and set ephemeral node when reconnect event
handler is called.
was:
When zookeeper reconnect happens, nimbus registry can be deleted though nimbus
is alive.
Below is zookeeper node for nimbus registry.
{code}
get /storm/nimbuses/<host>:6627
?f`d``??????M?-?-.?/??5??/H?+.IL???ON??``b`?|???^^???????
?'h?g?g?g?g
t-?,[??Q
cZxid = 0x4000005ae
ctime = Fri Jul 01 11:43:51 UTC 2016
mZxid = 0x4000005ae
mtime = Fri Jul 01 11:43:51 UTC 2016
pZxid = 0x4000005ae
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x255a62e310c0005
dataLength = 98
numChildren = 0
{code}
{code}
get /storm/nimbuses/<host>:6627
?f`d``??????M?-?-.?/??5??/H?+.IL???ON??``b`?|???^^???????
?'h?g?g?g?g
t-?,[??Q
cZxid = 0x4000005ae
ctime = Fri Jul 01 11:43:51 UTC 2016
mZxid = 0x50000000e
mtime = Fri Jul 01 11:46:08 UTC 2016
pZxid = 0x4000005ae
cversion = 0
dataVersion = 1
aclVersion = 0
ephemeralOwner = 0x255a62e310c0005
dataLength = 98
numChildren = 0
{code}
Below is transaction log for that node.
{code}
7/1/16 11:43:51 AM UTC session 0x255a62e310c0005 cxid 0xd zxid 0x4000005ae
create
'/storm/nimbuses/<host>:6627,#1fffffff8b80000000ffffffe36660646060ffffff90ffffffcfffffffcaffffffc9ffffffccffffffd54dffffffcc2dffffffd62d2effffffc92fffffffcaffffffd535ffffffd2ffffffcb2f48ffffffcd2b2e494cffffffceffffffceffffffc94f4effffffccffffffe160606260ffffff907cffffffccffffffc1ffffffc01c5e165effffffceffffffc4ffffffc0ffffffc2ffffffc0ffffffcdffffffc0affffffd42768ffffffa867ffffffa067ffffffa867ffffffa467affffffa4d742dffffff8c2c1805b14ffffffc2ffffffaf51000,v{s{31,s{'world,'anyone}}},T,10
7/1/16 11:46:08 AM UTC session 0x355a647bd8c0000 cxid 0x3 zxid 0x50000000e
setData
'/storm/nimbuses/<host>:6627,#1fffffff8b80000000ffffffe36660646060ffffff90ffffffcfffffffcaffffffc9ffffffccffffffd54dffffffcc2dffffffd62d2effffffc92fffffffcaffffffd535ffffffd2ffffffcb2f48ffffffcd2b2e494cffffffceffffffceffffffc94f4effffffccffffffe160606260ffffff907cffffffccffffffc1ffffffc01c5e165effffffceffffffc4ffffffc0ffffffc2ffffffc0ffffffcdffffffc0affffffd42768ffffffa867ffffffa067ffffffa867ffffffa467affffffa4d742dffffff8c2c1805b14ffffffc2ffffffaf51000,1
{code}
Please take a look at ctime, mtime, and ephemeralOwner.
Ephemeral owner session was already closed from nimbus side but there's
possible for node to be not deleted immediately, so new session doesn't create
new node but set the value to ephemeral node for other session which is already
closed.
{code}
2016-07-01 11:45:05.675 o.a.s.s.o.a.z.ClientCnxn [DEBUG] Disconnecting client
for session: 0x255a62e310c0005
2016-07-01 11:45:05.675 o.a.s.s.o.a.z.ZooKeeper [INFO] Session:
0x255a62e310c0005 closed
{code}
We can delete the node first and set ephemeral node when reconnect event
handler is called.
> Nimbus discovery can fail when zookeeper reconnect happens.
> -----------------------------------------------------------
>
> Key: STORM-1941
> URL: https://issues.apache.org/jira/browse/STORM-1941
> Project: Apache Storm
> Issue Type: Bug
> Components: storm-core
> Affects Versions: 1.0.0, 1.0.1
> Reporter: Jungtaek Lim
> Assignee: Jungtaek Lim
> Priority: Critical
>
> When zookeeper reconnect happens, nimbus registry can be deleted though
> nimbus is alive.
> Below is zookeeper node for nimbus registry.
> {code}
> get /storm/nimbuses/<host>:6627
> ?f`d``??????M?-?-.?/??5??/H?+.IL???ON??``b`?|???^^???????
> ?'h?g?g?g?g
> t-?,[??Q
> cZxid = 0x4000005ae
> ctime = Fri Jul 01 11:43:51 UTC 2016
> mZxid = 0x4000005ae
> mtime = Fri Jul 01 11:43:51 UTC 2016
> pZxid = 0x4000005ae
> cversion = 0
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x255a62e310c0005
> dataLength = 98
> numChildren = 0
> {code}
> {code}
> get /storm/nimbuses/<host>:6627
> ?f`d``??????M?-?-.?/??5??/H?+.IL???ON??``b`?|???^^???????
> ?'h?g?g?g?g
> t-?,[??Q
> cZxid = 0x4000005ae
> ctime = Fri Jul 01 11:43:51 UTC 2016
> mZxid = 0x50000000e
> mtime = Fri Jul 01 11:46:08 UTC 2016
> pZxid = 0x4000005ae
> cversion = 0
> dataVersion = 1
> aclVersion = 0
> ephemeralOwner = 0x255a62e310c0005
> dataLength = 98
> numChildren = 0
> {code}
> Below is transaction log for that node.
> {code}
> 7/1/16 11:43:51 AM UTC session 0x255a62e310c0005 cxid 0xd zxid 0x4000005ae
> create
> '/storm/nimbuses/<host>:6627,#1fffffff8b80000000ffffffe36660646060ffffff90ffffffcfffffffcaffffffc9ffffffccffffffd54dffffffcc2dffffffd62d2effffffc92fffffffcaffffffd535ffffffd2ffffffcb2f48ffffffcd2b2e494cffffffceffffffceffffffc94f4effffffccffffffe160606260ffffff907cffffffccffffffc1ffffffc01c5e165effffffceffffffc4ffffffc0ffffffc2ffffffc0ffffffcdffffffc0affffffd42768ffffffa867ffffffa067ffffffa867ffffffa467affffffa4d742dffffff8c2c1805b14ffffffc2ffffffaf51000,v{s{31,s{'world,'anyone}}},T,10
> 7/1/16 11:46:08 AM UTC session 0x355a647bd8c0000 cxid 0x3 zxid 0x50000000e
> setData
> '/storm/nimbuses/<host>:6627,#1fffffff8b80000000ffffffe36660646060ffffff90ffffffcfffffffcaffffffc9ffffffccffffffd54dffffffcc2dffffffd62d2effffffc92fffffffcaffffffd535ffffffd2ffffffcb2f48ffffffcd2b2e494cffffffceffffffceffffffc94f4effffffccffffffe160606260ffffff907cffffffccffffffc1ffffffc01c5e165effffffceffffffc4ffffffc0ffffffc2ffffffc0ffffffcdffffffc0affffffd42768ffffffa867ffffffa067ffffffa867ffffffa467affffffa4d742dffffff8c2c1805b14ffffffc2ffffffaf51000,1
> {code}
> Please take a look at ctime, mtime, and ephemeralOwner.
> Ephemeral owner session was already closed from nimbus side but there's
> possible for node to be not deleted immediately, so new session doesn't
> create new node but set the value to ephemeral node for other session which
> is already closed.
> *And eventually that node is deleted although session 0x355a647bd8c0000 is
> alive.*
> {code}
> 2016-07-01 11:45:05.675 o.a.s.s.o.a.z.ClientCnxn [DEBUG] Disconnecting client
> for session: 0x255a62e310c0005
> 2016-07-01 11:45:05.675 o.a.s.s.o.a.z.ZooKeeper [INFO] Session:
> 0x255a62e310c0005 closed
> {code}
> We can delete the node first and set ephemeral node when reconnect event
> handler is called.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)