[
https://issues.apache.org/jira/browse/KUDU-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17030205#comment-17030205
]
ASF subversion and git services commented on KUDU-2149:
-------------------------------------------------------
Commit b32283d2e5ce3d88e4f6afdeedf1c616721cce3a in kudu's branch
refs/heads/master from Adar Dembo
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=b32283d ]
KUDU-2155: disable failure detector around elections
This is a more complete fix for KUDU-2149 which disables the failure
detector completely around a leader election.
There are several changes to make this happen:
1. The FD is changed to use a one-shot timer, which automatically disables
upon firing.
2. Because all elections are guaranteed to reach DoElectionCallback, that's
where we reenable the FD.
3. We provide a special case for pre-elections where FD reenabling is
deferred until after the subsequent real election finishes.
I'm still not convinced this is the cleanest approach, but it seems to work.
Change-Id: Idcd311cee028c48e908f290d60c474e8a4557d97
Reviewed-on: http://gerrit.cloudera.org:8080/8134
Tested-by: Kudu Jenkins
Reviewed-by: Alexey Serbin <[email protected]>
Reviewed-by: Andrew Wong <[email protected]>
> New failure detector implementation can lead to election stacking
> -----------------------------------------------------------------
>
> Key: KUDU-2149
> URL: https://issues.apache.org/jira/browse/KUDU-2149
> Project: Kudu
> Issue Type: Bug
> Components: consensus
> Affects Versions: 1.5.0
> Reporter: Adar Dembo
> Assignee: Adar Dembo
> Priority: Critical
> Fix For: 1.6.0
>
>
> A new failure detector (FD) implementation was merged in commit 21b0f3d and
> is part of Kudu 1.5. One of the key changes is that the detection logic runs
> on a reactor thread rather than on a dedicated per-replica thread. But,
> because reactor threads are shared, the election started in the event of a
> failure must be thunked to the Raft thread pool (starting an election means
> casting a vote, which generally means performing IO, which is verboten on a
> reactor thread).
> By thunking, the FD immediately rearms; the previous implementation did not
> do this. If there's a lot of outstanding IO (i.e. during an election storm
> across thousands of tablets), it's possible for the FD to fire again while
> the first election task is still waiting to cast its vote. The new election
> task will try to acquire the consensus lock and block on it (it's held by the
> first election task). And so on. When the original IO finally completes, all
> of the follow-on elections will get unblocked at the same time.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)