Please see comments inline.

Thanks,
Alex

On Mon, Jul 29, 2019 at 5:29 PM Karolos Antoniadis <karo...@gmail.com>
wrote:

> Hi ZooKeeper developers,
>
> ZooKeeper seems to be logging a "*LEADER ELECTION TOOK*" message even
> though no leader election takes place during a reconfiguration.
>
> This can be reproduced by following these steps:
> 1) start a ZooKeeper cluster (e.g., 3 participants)
> 2) start a client that connects to some follower
> 3) perform a *reconfig* operation that removes the leader from the cluster
>
> After the reconfiguration takes place, we can see that the log files of the
> remaining participants contain a "*LEADER ELECTION TOOK*" message. For
> example, a line that contains
>
> *2019-07-29 23:07:38,518 [myid:2] - INFO
>  [QuorumPeer[myid=2](plain=0.0.0.0:2792)(secure=disabled):Follower@75] -
> FOLLOWING - LEADER ELECTION TOOK - 57 MS*
>
> However, no leader election took place. With that, I mean that no server
> went *LOOKING *and then started voting and sending notifications to other
> participants as would be in a normal leader election.
> It seems, that before the *reconfig *is committed, the participant that is
> going to be the next leader is already decided (see here:
>
> https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/Leader.java#L865
> ).
>
> I think the above issue raises the following questions:
> - Should we avoid logging LEADER ELECTION messages altogether in such
> cases?
>

In the specific scenario you described, the leader has changed, but our
heuristic for choosing the leader apparently worked and a new leader could
be elected without running the full election.
Notice that we could be unlucky and the designated leader could be offline,
and then we'll fall back on election. It's useful to know how much time it
takes to start following the new leader.


> - Or, should there be some logging for the time it took for the
> reconfiguration (e.g., the time between a participant gets a *reconfig*
> operation till the operation is committed)? Would such a time value be
> useful?
>

IIRC the LEADER ELECTION message is used for this purpose. if you just look
on the time to commit the reconfig operation, you won't
account for the work that happens when the commit message is received, such
as leader re-election, role-change (follower->observer conversion and such)
etc which is what takes most of the time.
Committing a reconfig operation is usually not much more expensive than
committing a normal operation. But perhaps you're right that we should
measure the total time more accurately. Would you
like to open a Jira and perhaps take a stab at improving this ?

>
> Best,
> Karolos
>

Reply via email to