[
https://issues.apache.org/jira/browse/KUDU-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Adar Dembo updated KUDU-2815:
-----------------------------
Attachment: raft_consensus_nonvoter-itest.txt
Saw this again. I agree with Will's assessment, though it's not clear to me
whether the fix should be in production code (i.e. maybe the vote request
timeout shouldn't be coupled to the failure detection duration, or maybe timed
out vote requests should be retried) or in test code. Probably the latter, but
just wanted to float an alternative.
> RaftConsensusNonVoterITest.PromoteAndDemote fails if manually-run election
> fails.
> ---------------------------------------------------------------------------------
>
> Key: KUDU-2815
> URL: https://issues.apache.org/jira/browse/KUDU-2815
> Project: Kudu
> Issue Type: Bug
> Affects Versions: 1.9.0
> Reporter: Will Berkeley
> Priority: Major
> Attachments: raft_consensus_nonvoter-itest.txt,
> raft_consensus_nonvoter-itest.txt
>
>
> RaftConsensusNonVoterITest.PromoteAndDemote disables normal leader elections
> and runs an election manually, to avoid some previous flakiness.
> Unfortunately, this introduces flakiness, because, rarely, the manual
> election fails when the vote requests time out. The candidate concludes it
> has lost the election, and then after that the two other voters vote yes.
> The timeout for vote requests is 170ms, which is pretty short. If it were
> raised to, say, 5s, the test would probably not be flaky anymore.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)