[ 
https://issues.apache.org/jira/browse/KUDU-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16484697#comment-16484697
 ] 

Todd Lipcon commented on KUDU-2452:
-----------------------------------

I think we already do stop the failure detector during UpdateReplica, at least 
while we're waiting on the log, don't we?

The issue I've seen more is that enough tablets are blocked in UpdateConsensus 
that other (unrelated) tablets can't get their heartbeats processed due to 
queue overflows. Those _other_ victim tablets then end up calling 
pre-elections, which only contribute more to the load.

I think the best starting point for this would be KUDU-1707, which would allow 
simple liveness heartbeats to continue to get through even when tablets are 
holding up the threads.

> Prevent follower from causing pre-elections when UpdateConsensus is slow
> ------------------------------------------------------------------------
>
>                 Key: KUDU-2452
>                 URL: https://issues.apache.org/jira/browse/KUDU-2452
>             Project: Kudu
>          Issue Type: Improvement
>    Affects Versions: 1.7.0
>            Reporter: Will Berkeley
>            Priority: Major
>
> Thanks to pre-elections (KUDU-1365), slow UpdateConsensus calls on a single 
> follower don't disturb the whole tablet by calling elections. However, 
> sometimes I see situations where one or more followers are constantly calling 
> pre-elections, and only rarely, if ever, overflowing their service queues. 
> Occasionally, in 3x replicated tablets, the followers will get "lucky" and 
> detect a leader failure at around the same time, and an election will happen.
> This background instability has caused bugs like KUDU-2343 that should be 
> rare to occur pretty frequently, plus the extra RequestConsensusVote RPCs add 
> a little more stress on the consensus service and on replicas' consensus 
> locks. It also spams the logs, since there's no generally no exponential 
> backoff for these pre-elections because there's a successful heartbeat in 
> between them.
> It seems like we can get into the situation where the average number of 
> in-flight consensus requests is constant over time, so on average we are 
> processing each heartbeat in less than the heartbeat interval, however some 
> heartbeats take longer. Since UpdateConsensus calls to a replica are 
> serialized, a few of these in a row trigger the failure detector, despite the 
> follower receiving every heartbeat in a timely manner and responding 
> successfully eventually (and on average in a timely manner).
> It'd be nice to prevent these worthless pre-elections. A couple of ideas:
> 1. Separately calculate a backoff for failed pre-elections, and reset it when 
> a pre-election succeeds or more generally when there's an election.
> 2. Don't count the time the follower is executing UpdateConsensus against the 
> failure detector. [~mpercy] suggested stopping the failure detector during 
> UpdateReplica() and resuming it when the function returns.
> 3. Move leader failure detection out-of-band of UpdateConsensus entirely.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to