ascherbakoff commented on a change in pull request #481:
URL: https://github.com/apache/ignite-3/pull/481#discussion_r765846260
##########
File path:
modules/raft/src/main/java/org/apache/ignite/raft/jraft/core/NodeImpl.java
##########
@@ -1171,6 +1218,9 @@ private void electSelf() {
}
resetLeaderId(PeerId.emptyPeer(), new
Status(RaftError.ERAFTTIMEDOUT,
"A follower's leader_id is reset to NULL as it begins to
request_vote."));
+ if (!options.getRaftOptions().isStepDownWhenVoteTimedout()) {
Review comment:
Why can't we have a single place for adjustElectionTimeout, for example
before a preVote ?
Also, I see no tests using setStepDownWhenVoteTimedout(false), so this logic
is unclear.
##########
File path:
modules/raft/src/main/java/org/apache/ignite/raft/jraft/core/NodeImpl.java
##########
@@ -606,7 +610,50 @@ private void handleElectionTimeout() {
}
}
}
-
+
+ /**
+ * Method that adjusts election timeout after several consecutive
unsuccessful leader elections.
+ *
+ * Notes about general algorithm: the main idea is that in a stable
cluster election timeout should be relatively small, but when cluster
Review comment:
Raft doesn't know a bit about some "membership layer". The javadoc
should be rewritten in terms of a raft node stability, like long GC pauses or
OS freezes, preventing elections from a completion. Also, I would rewrite this
to using strategy pattern, similar Ignite 2
org.apache.ignite.spi.TimeoutStrategy. The current election tmieout should be
the function of unsuccessful election rounds. Also it would be good to print
election rounds in the log. This strategy must be configurable by NodeOptions.
By default, noop strategy must be used.
##########
File path: modules/raft/src/main/java/org/apache/ignite/internal/raft/Loza.java
##########
@@ -67,7 +67,7 @@
/** Network timeout. */
// TODO: IGNITE-15705 Correct value should be investigated
- private static final int NETWORK_TIMEOUT = 3000;
+ private static final int RPC_TIMEOUT = 3000;
Review comment:
Let's rename TIMEOUT -> RETRY_TIMEOUT
##########
File path:
modules/raft/src/main/java/org/apache/ignite/raft/jraft/option/NodeOptions.java
##########
@@ -49,6 +49,15 @@
// Default: 1200 (1.2s)
private int electionTimeoutMs = 1200; // follower to candidate timeout
+ // The upper bound of the election timeout adjusting. Must be more than
timeout of a membership protocol to remove failed node from
+ // the cluster. In our case, we may assume that 11s could be enough as far
as 11s is greater than suspicion timeout
+ // for the 1000 nodes cluster with ping interval equals to 500ms.
+ // See NodeIml#adjustElectionTimeout for more details about adjusting
election timeouts.
+ public static final int ELECTION_TIMEOUT_MS_MAX = 11_000;
+
+ // Max number of consecutive unsuccessful elections after which election
timeout is adjusted.
+ public static final int MAX_ELECTION_ROUNDS_WITHOUT_ADJUSTING = 3;
+
Review comment:
This looks really bad and must be refactored to strategy options.
##########
File path:
modules/raft/src/integrationTest/java/org/apache/ignite/raft/jraft/core/ItNodeTest.java
##########
@@ -3330,6 +3330,134 @@ public void testBlockedElection() throws Exception {
leader = cluster.getLeader();
LOG.info("Elect new leader is {}, curTerm={}", leader.getLeaderId(),
((NodeImpl) leader).getCurrentTerm());
}
+
+ @Test
+ public void testElectionTimeoutAutoAdjustWhenBlockedAllMesssages() throws
Exception {
+ List<PeerId> peers = TestUtils.generatePeers(3);
+ cluster = new TestCluster("unittest", dataPath, peers, testInfo);
+
+
+ for (PeerId peer : peers)
+ assertTrue(cluster.start(peer.getEndpoint()));
+
+ cluster.waitLeader();
+
+ Node leader = cluster.getLeader();
+
+ int initElectionTimeout = leader.getOptions().getElectionTimeoutMs();
+
+ LOG.warn("Current leader {}, electTimeout={}",
leader.getNodeId().getPeerId(), leader.getOptions().getElectionTimeoutMs());
+
+ List<Node> followers = cluster.getFollowers();
+
+ for (Node follower : followers) {
+ NodeImpl follower0 = (NodeImpl) follower;
+
+ assertEquals(initElectionTimeout,
follower0.getOptions().getElectionTimeoutMs());
+ }
+
+ for (Node follower : followers) {
+ NodeImpl follower0 = (NodeImpl) follower;
+ DefaultRaftClientService rpcService = (DefaultRaftClientService)
follower0.getRpcClientService();
+ RpcClientEx rpcClientEx = (RpcClientEx) rpcService.getRpcClient();
+ rpcClientEx.blockMessages((msg, nodeId) -> true);
+ }
+
+ LOG.warn("Stop leader {}, curTerm={}", leader.getNodeId().getPeerId(),
((NodeImpl) leader).getCurrentTerm());
+
+ assertTrue(cluster.stop(leader.getNodeId().getPeerId().getEndpoint()));
+
+ assertNull(cluster.getLeader());
+
+ assertTrue(waitForCondition(() -> followers.stream().allMatch(f ->
f.getOptions().getElectionTimeoutMs() > initElectionTimeout),
+ (long) NodeOptions.MAX_ELECTION_ROUNDS_WITHOUT_ADJUSTING
+ // need to multiply to 2 because stepDown happens
after voteTimer timeout
+ * (initElectionTimeout +
followers.get(0).getOptions().getRaftOptions().getMaxElectionDelayMs()) * 2));
+
+ for (Node follower : followers) {
+ NodeImpl follower0 = (NodeImpl) follower;
+ DefaultRaftClientService rpcService = (DefaultRaftClientService)
follower0.getRpcClientService();
+ RpcClientEx rpcClientEx = (RpcClientEx) rpcService.getRpcClient();
+ rpcClientEx.stopBlock();
+ }
+
+ // elect new leader
+ cluster.waitLeader();
+ leader = cluster.getLeader();
+
+ LOG.info("Elect new leader is {}, curTerm={}", leader.getLeaderId(),
((NodeImpl) leader).getCurrentTerm());
+
+ assertTrue(waitForCondition(() -> followers.stream().allMatch(f ->
f.getOptions().getElectionTimeoutMs() == initElectionTimeout),
+ 3_000));
+ }
+
+ @Test
+ public void testElectionTimeoutAutoAdjustWhenBlockedRequestVoteMessages()
throws Exception {
Review comment:
This test must be rewritten to avoid copy&paste
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]