This is an automated email from the ASF dual-hosted git repository.

alexey pushed a commit to branch branch-1.18.x
in repository https://gitbox.apache.org/repos/asf/kudu.git


The following commit(s) were added to refs/heads/branch-1.18.x by this push:
     new 8180345b0 KUDU-3641 fix flaky TestNewLeaderCantResolvePeers (take 2)
8180345b0 is described below

commit 8180345b05089db3fb508698bcc5b6e5147ba377
Author: Alexey Serbin <[email protected]>
AuthorDate: Thu Feb 20 17:45:05 2025 -0800

    KUDU-3641 fix flaky TestNewLeaderCantResolvePeers (take 2)
    
    I noticed that even after [1] the TestNewLeaderCantResolvePeers was
    still failing in about once in 60 runs [2], so I took a closer look.
    
    It turns out StartElection() doesn't trigger a re-election if it
    arriving at the current Raft leader.  To address that, this patch
    replaces StartElection() back with LeaderStepDown() but with new
    target leader being the tablet replica at the third tablet server.
    
    This is a follow-up to [1].
    
    [1] https://github.com/apache/kudu/commit/6c77ec875
    [2] 
http://dist-test.cloudera.org:8080/test_drilldown?test_name=raft_consensus_election-itest
    
    Change-Id: I3bee924353079f7c8bfab6d0d5a6367bd1ee243e
    Reviewed-on: http://gerrit.cloudera.org:8080/22516
    Tested-by: Alexey Serbin <[email protected]>
    Reviewed-by: Yifan Zhang <[email protected]>
    (cherry picked from commit 8ce91854a3ae749ea02c45096dfff4b877050a82)
    Reviewed-on: http://gerrit.cloudera.org:8080/22517
---
 src/kudu/integration-tests/raft_consensus_election-itest.cc | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/src/kudu/integration-tests/raft_consensus_election-itest.cc 
b/src/kudu/integration-tests/raft_consensus_election-itest.cc
index 06478a0b3..5ccdb61e1 100644
--- a/src/kudu/integration-tests/raft_consensus_election-itest.cc
+++ b/src/kudu/integration-tests/raft_consensus_election-itest.cc
@@ -210,6 +210,8 @@ TEST_F(RaftConsensusElectionITest, 
TestNewLeaderCantResolvePeers) {
   const auto& tablet_id = tablet_ids[0];
   const auto bad_ts_uuid = ts_iter->second->uuid();
   const auto* second_ts = (++ts_iter)->second;
+  ASSERT_NE(tablet_servers_.end(), ts_iter);
+  const auto* third_ts = (++ts_iter)->second;
 
   // Start failing DNS resolver queries for the selected tablet server during
   // leader election.
@@ -272,8 +274,9 @@ TEST_F(RaftConsensusElectionITest, 
TestNewLeaderCantResolvePeers) {
   }
   // Cause an election again to trigger a new report to the master. This time
   // the master should place the replica since it has a new tserver available.
-  ASSERT_OK(StartElection(second_ts, tablet_id, kTimeout));
-  ASSERT_OK(WaitUntilLeader(second_ts, tablet_id, kTimeout));
+  ASSERT_OK(LeaderStepDown(
+      second_ts, tablet_id, kTimeout, /*error=*/nullptr, third_ts->uuid()));
+  ASSERT_OK(WaitUntilLeader(third_ts, tablet_id, kTimeout));
   NO_FATALS(cluster_->AssertNoCrashes());
 
   STLDeleteValues(&tablet_servers_);

Reply via email to