This is an automated email from the ASF dual-hosted git repository.
alexey pushed a commit to branch branch-1.18.x
in repository https://gitbox.apache.org/repos/asf/kudu.git
The following commit(s) were added to refs/heads/branch-1.18.x by this push:
new 8180345b0 KUDU-3641 fix flaky TestNewLeaderCantResolvePeers (take 2)
8180345b0 is described below
commit 8180345b05089db3fb508698bcc5b6e5147ba377
Author: Alexey Serbin <[email protected]>
AuthorDate: Thu Feb 20 17:45:05 2025 -0800
KUDU-3641 fix flaky TestNewLeaderCantResolvePeers (take 2)
I noticed that even after [1] the TestNewLeaderCantResolvePeers was
still failing in about once in 60 runs [2], so I took a closer look.
It turns out StartElection() doesn't trigger a re-election if it
arriving at the current Raft leader. To address that, this patch
replaces StartElection() back with LeaderStepDown() but with new
target leader being the tablet replica at the third tablet server.
This is a follow-up to [1].
[1] https://github.com/apache/kudu/commit/6c77ec875
[2]
http://dist-test.cloudera.org:8080/test_drilldown?test_name=raft_consensus_election-itest
Change-Id: I3bee924353079f7c8bfab6d0d5a6367bd1ee243e
Reviewed-on: http://gerrit.cloudera.org:8080/22516
Tested-by: Alexey Serbin <[email protected]>
Reviewed-by: Yifan Zhang <[email protected]>
(cherry picked from commit 8ce91854a3ae749ea02c45096dfff4b877050a82)
Reviewed-on: http://gerrit.cloudera.org:8080/22517
---
src/kudu/integration-tests/raft_consensus_election-itest.cc | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/src/kudu/integration-tests/raft_consensus_election-itest.cc
b/src/kudu/integration-tests/raft_consensus_election-itest.cc
index 06478a0b3..5ccdb61e1 100644
--- a/src/kudu/integration-tests/raft_consensus_election-itest.cc
+++ b/src/kudu/integration-tests/raft_consensus_election-itest.cc
@@ -210,6 +210,8 @@ TEST_F(RaftConsensusElectionITest,
TestNewLeaderCantResolvePeers) {
const auto& tablet_id = tablet_ids[0];
const auto bad_ts_uuid = ts_iter->second->uuid();
const auto* second_ts = (++ts_iter)->second;
+ ASSERT_NE(tablet_servers_.end(), ts_iter);
+ const auto* third_ts = (++ts_iter)->second;
// Start failing DNS resolver queries for the selected tablet server during
// leader election.
@@ -272,8 +274,9 @@ TEST_F(RaftConsensusElectionITest,
TestNewLeaderCantResolvePeers) {
}
// Cause an election again to trigger a new report to the master. This time
// the master should place the replica since it has a new tserver available.
- ASSERT_OK(StartElection(second_ts, tablet_id, kTimeout));
- ASSERT_OK(WaitUntilLeader(second_ts, tablet_id, kTimeout));
+ ASSERT_OK(LeaderStepDown(
+ second_ts, tablet_id, kTimeout, /*error=*/nullptr, third_ts->uuid()));
+ ASSERT_OK(WaitUntilLeader(third_ts, tablet_id, kTimeout));
NO_FATALS(cluster_->AssertNoCrashes());
STLDeleteValues(&tablet_servers_);