This is an automated email from the ASF dual-hosted git repository.
alexey pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/kudu.git
The following commit(s) were added to refs/heads/master by this push:
new 8ce91854a KUDU-3641 fix flaky TestNewLeaderCantResolvePeers (take 2)
8ce91854a is described below
commit 8ce91854a3ae749ea02c45096dfff4b877050a82
Author: Alexey Serbin <[email protected]>
AuthorDate: Thu Feb 20 17:45:05 2025 -0800
KUDU-3641 fix flaky TestNewLeaderCantResolvePeers (take 2)
I noticed that even after [1] the TestNewLeaderCantResolvePeers was
still failing in about once in 60 runs [2], so I took a closer look.
It turns out StartElection() doesn't trigger a re-election if it
arriving at the current Raft leader. To address that, this patch
replaces StartElection() back with LeaderStepDown() but with new
target leader being the tablet replica at the third tablet server.
This is a follow-up to [1].
[1] https://github.com/apache/kudu/commit/6c77ec875
[2]
http://dist-test.cloudera.org:8080/test_drilldown?test_name=raft_consensus_election-itest
Change-Id: I3bee924353079f7c8bfab6d0d5a6367bd1ee243e
Reviewed-on: http://gerrit.cloudera.org:8080/22516
Tested-by: Alexey Serbin <[email protected]>
Reviewed-by: Yifan Zhang <[email protected]>
---
src/kudu/integration-tests/raft_consensus_election-itest.cc | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/src/kudu/integration-tests/raft_consensus_election-itest.cc
b/src/kudu/integration-tests/raft_consensus_election-itest.cc
index 06478a0b3..5ccdb61e1 100644
--- a/src/kudu/integration-tests/raft_consensus_election-itest.cc
+++ b/src/kudu/integration-tests/raft_consensus_election-itest.cc
@@ -210,6 +210,8 @@ TEST_F(RaftConsensusElectionITest,
TestNewLeaderCantResolvePeers) {
const auto& tablet_id = tablet_ids[0];
const auto bad_ts_uuid = ts_iter->second->uuid();
const auto* second_ts = (++ts_iter)->second;
+ ASSERT_NE(tablet_servers_.end(), ts_iter);
+ const auto* third_ts = (++ts_iter)->second;
// Start failing DNS resolver queries for the selected tablet server during
// leader election.
@@ -272,8 +274,9 @@ TEST_F(RaftConsensusElectionITest,
TestNewLeaderCantResolvePeers) {
}
// Cause an election again to trigger a new report to the master. This time
// the master should place the replica since it has a new tserver available.
- ASSERT_OK(StartElection(second_ts, tablet_id, kTimeout));
- ASSERT_OK(WaitUntilLeader(second_ts, tablet_id, kTimeout));
+ ASSERT_OK(LeaderStepDown(
+ second_ts, tablet_id, kTimeout, /*error=*/nullptr, third_ts->uuid()));
+ ASSERT_OK(WaitUntilLeader(third_ts, tablet_id, kTimeout));
NO_FATALS(cluster_->AssertNoCrashes());
STLDeleteValues(&tablet_servers_);