Will Berkeley created KUDU-2452:
-----------------------------------

             Summary: Prevent follower from causing pre-elections when 
UpdateConsensus is slow
                 Key: KUDU-2452
                 URL: https://issues.apache.org/jira/browse/KUDU-2452
             Project: Kudu
          Issue Type: Improvement
    Affects Versions: 1.7.0
            Reporter: Will Berkeley


Thanks to pre-elections (KUDU-1365), slow UpdateConsensus calls on a single 
follower don't disturb the whole tablet by calling elections. However, 
sometimes I see situations where one or more followers are constantly calling 
pre-elections, and only rarely, if ever, overflowing their service queues. 
Occasionally, in 3x replicated tablets, the followers will get "lucky" and 
detect a leader failure at around the same time, and an election will happen.

This background instability has caused bugs like KUDU-2343 that should be rare 
to occur pretty frequently, plus the extra RequestConsensusVote RPCs add a 
little more stress on the consensus service and on replicas' consensus locks. 
It also spams the logs, since there's no generally no exponential backoff for 
these pre-elections because there's a successful heartbeat in between them.

It seems like we can get into the situation where the average number of 
in-flight consensus requests is constant over time, so on average we are 
processing each heartbeat in less than the heartbeat interval, however some 
heartbeats take longer. Since UpdateConsensus calls to a replica are 
serialized, a few of these in a row trigger the failure detector, despite the 
follower receiving every heartbeat in a timely manner and responding 
successfully eventually (and on average in a timely manner).

It'd be nice to prevent these worthless pre-elections. A couple of ideas:
1. Separately calculate a backoff for failed pre-elections, and reset it when a 
pre-election succeeds or more generally when there's an election.
2. Don't count the time the follower is executing UpdateConsensus against the 
failure detector. [~mpercy] suggested stopping the failure detector during 
UpdateReplica() and resuming it when the function returns.
3. Move leader failure detection out-of-band of UpdateConsensus entirely.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to