[
https://issues.apache.org/jira/browse/CURATOR-188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484896#comment-14484896
]
Henrik Nordvik commented on CURATOR-188:
----------------------------------------
We are also experiencing similar issues. After having network issues, no leader
is elected. We are using the LeaderSelector pattern, and we get the
"reconnected" event, yet no leader, because there's still a hanging lock.
{code}
[zk: localhost:20101(CONNECTED) 0] ls /app/leader/SR
[_c_eadb5f95-ea3c-4bf5-b7b1-c089df38a2bd-lock-0000000746,
_c_3c9fd125-e3ce-4ca3-919f-0f5968c2c12c-lock-0000000745,
_c_87358962-171c-4ce2-a34b-92038b400e8
d-lock-0000000744]
[zk: localhost:20101(CONNECTED) 1] get
/app/leader/SR/_c_eadb5f95-ea3c-4bf5-b7b1-c089df38a2bd-lock-0000000746
10.0.0.148
cZxid = 0x2900012cec
ctime = Sun Mar 29 03:56:17 CEST 2015
mZxid = 0x2900012cec
mtime = Sun Mar 29 03:56:17 CEST 2015
pZxid = 0x2900012cec
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x34c5d99eea20001
dataLength = 10
numChildren = 0
[zk: localhost:20101(CONNECTED) 2] get
/app/leader/SR/_c_3c9fd125-e3ce-4ca3-919f-0f5968c2c12c-lock-0000000745
10.0.0.151
cZxid = 0x290000256c
ctime = Sat Mar 28 05:19:43 CET 2015
mZxid = 0x290000256c
mtime = Sat Mar 28 05:19:43 CET 2015
pZxid = 0x290000256c
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x14c5d99f0850000
dataLength = 10
numChildren = 0
[zk: localhost:20101(CONNECTED) 3] get
/app/leader/SR/_c_87358962-171c-4ce2-a34b-92038b400e8d-lock-0000000744
10.0.0.148
cZxid = 0x29000007bb
ctime = Sat Mar 28 01:24:50 CET 2015
mZxid = 0x29000007bb
mtime = Sat Mar 28 01:24:50 CET 2015
pZxid = 0x29000007bb
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x34c5d99eea20001
dataLength = 10
numChildren = 0
{code}
When we stop the node having two locks (10.0.0.148), both locks disappear and
the other node is elected leader.
> Cannot determine the leader if zookeeper leader fails
> -----------------------------------------------------
>
> Key: CURATOR-188
> URL: https://issues.apache.org/jira/browse/CURATOR-188
> Project: Apache Curator
> Issue Type: Bug
> Components: Framework
> Affects Versions: 2.7.1
> Reporter: Rodrigo Nogueira
>
> Hi,
> I'm trying to upgrade the curator framework from 2.6.0 to 2.7.1, but I'm
> having some problems.
> In the 2.6.0 version almost everything works fine, but the
> ServiceDiscovery.updateService() that is already fixed in the 2.7.1.
> In the 2.7.1 version, when I kill the zookeeper leader, my path for leader
> election becomes inconsistent.
> For instance, I have three apps registered in the leader path
> (/com/myapp/leader/):
> [_c_85089ba7-0819-40a2-90b5-640bcb5e9e68-lock-0000000003,
> _c_070619f6-539e-4784-8068-bdc66d2a25bc-lock-0000000005,
> _c_54a126d3-31e8-464f-9216-5e0ad23fad1b-lock-0000000004]
> After killing the zookeeper leader, what I got in the /com/myapp/leader/ is:
> [_c_648d5311-a59c-4bc4-bf32-c0605dea9b6a-lock-0000000007,
> _c_85089ba7-0819-40a2-90b5-640bcb5e9e68-lock-0000000003,
> _c_f51f9660-3cbf-4ba8-8dba-c1e04ca14a93-lock-0000000008,
> _c_49696b77-e45a-40b6-8feb-96623c67fd85-lock-0000000006]
> Sometimes I got more nodes (five or six).
> I'm aware that Curator removes and adds all nodes when a zookeeper node
> fails. But it seems that the previous nodes are not being removed correctly.
> Is that the expected behavior ?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)