[
https://issues.apache.org/jira/browse/ZOOKEEPER-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066943#comment-17066943
]
Lasaro Camargos edited comment on ZOOKEEPER-3769 at 3/25/20, 7:20 PM:
----------------------------------------------------------------------
Thank you for the analysis, [~symat].
Wrt to testing with NETTY, before trying SASL I did try just NETTY, but the
behavior was exactly the same.
Wrt to using an older JDK, I reverted all my changes to the configs and put
back the original version, 3.5.5, but didn't get to try other JDK. The problem
no longer reproduces and I am still trying to figure if/what I am missing that
might have changed the setup.
Regarding not handling the BufferUnderflowException properly, yes, it makes
sense; the thread died and wasn't recreated so no more messages were ever
received.
was (Author: lasaro):
Thank you for the analysis, [~symat].
Wrt to testing with NETTY, before trying SASL I did try just NETTY, but the
behavior was exactly the same.
Wrt to using an older JDK, I reverted all my changes to the configs and put
back the original version, 3.5.5, but didn't get to try other JDK. The problem
no longer reproduces and I am still trying to figure if/what I am missing that
might have changed the setup.
> fast leader election does not end if leader is taken down
> ---------------------------------------------------------
>
> Key: ZOOKEEPER-3769
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3769
> Project: ZooKeeper
> Issue Type: Bug
> Components: leaderElection
> Affects Versions: 3.5.7
> Reporter: Lasaro Camargos
> Assignee: Mate Szalay-Beko
> Priority: Major
> Attachments: node1.log, node2.log, node3.log
>
>
> In a cluster with three nodes, node3 is the leader and the other nodes are
> followers.
> If I stop node3, the other two nodes do not finish the leader election.
> This is happening with ZK 3.5.7, openjdk version "12.0.2" 2019-07-16, and
> this config
>
> tickTime=2000
> initLimit=30
> syncLimit=3
> dataDir=/hedvig/hpod/data
> dataLogDir=/hedvig/hpod/log
> clientPort=2181
> snapCount=100000
> autopurge.snapRetainCount=3
> autopurge.purgeInterval=1
> skipACL=yes
> preAllocSize=65536
> maxClientCnxns=0
> 4lw.commands.whitelist=*
> admin.enableServer=false
> server.1=companydemo1.snc4.companyinc.com:3000:4000
> server.2=companydemo2.snc4.companyinc.com:3000:4000
> server.3=companydemo3.snc4.companyinc.com:3000:4000
>
> Could you have a look at the logs and help me figure this out? It seems like
> node 1 is not getting notifications back from node2, but I don't see anything
> wrong with the network so I am wondering if bugs like ZOOKEEPER-3756 could
> be causing it.
>
> In the logs, node3 is killed at 11:17:14
> node2 is killed at 11:17:50 2 and node 1 at 11:18:02
>
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)