[ https://issues.apache.org/jira/browse/ZOOKEEPER-3479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael Han reassigned ZOOKEEPER-3479: -------------------------------------- Assignee: Karolos Antoniadis > Logging false leader election times > ----------------------------------- > > Key: ZOOKEEPER-3479 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3479 > Project: ZooKeeper > Issue Type: Bug > Components: leaderElection > Affects Versions: 3.5.5 > Reporter: Karolos Antoniadis > Assignee: Karolos Antoniadis > Priority: Minor > Labels: pull-request-available > Attachments: server1.txt, server2.txt > > Time Spent: 2h 20m > Remaining Estimate: 0h > > There seems to be a problem with the logging of leader election times: the > logged times are much smaller than the actual time it took for the leader > election to complete. > This bug can be easily reproduced by following these steps: > 1) Run a ZK cluster of 3 servers > 2) Kill the server that is currently the leader > 3) The log files of the remaining 2 servers contain false leader election > times > > In the attached files you can see the log files of the remaining 2 serve. For > brevity, I removed the parts before and after the leader election from the > log files. > For example, in {{server1.txt}} we can see that: > > {code:java} > 2019-07-31 00:57:31,852 [myid:1] - WARN > [QuorumPeer[myid=1](plain=/0.0.0.0:2791)(secure=disabled):QuorumPeer@1318] - > PeerState set to LOOKING > 2019-07-31 00:57:31,853 [myid:1] - INFO > [QuorumPeer[myid=1](plain=/0.0.0.0:2791)(secure=disabled):QuorumPeer@1193] - > LOOKING > 2019-07-31 00:57:31,853 [myid:1] - INFO > [QuorumPeer[myid=1](plain=/0.0.0.0:2791)(secure=disabled):FastLeaderElection@885] > - New election. My id = 1, proposed zxid=0x100000001 > [...] > 2019-07-31 00:57:32,272 [myid:1] - INFO > [QuorumPeer[myid=1](plain=/0.0.0.0:2791)(secure=disabled):Follower@69] - > FOLLOWING - LEADER ELECTION TOOK - 1 MS{code} > Leader election supposedly took only 1ms, but in reality it took (32,272 - > 31,853) = 419ms! > The reason for this bug seems to be the introduction of this line > {code:java} > start_fle = Time.currentElapsedTime();{code} > (seen here > [https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java#L1402]) > > back in this commit > [https://github.com/apache/zookeeper/commit/5428cd4bc963c2e653a260c458a8a8edf3fa08ef]. > > > > -- This message was sent by Atlassian JIRA (v7.6.14#76016)