This is an automated email from the ASF dual-hosted git repository.

alexey pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/kudu.git


The following commit(s) were added to refs/heads/master by this push:
     new 8ce91854a KUDU-3641 fix flaky TestNewLeaderCantResolvePeers (take 2)
8ce91854a is described below

commit 8ce91854a3ae749ea02c45096dfff4b877050a82
Author: Alexey Serbin <[email protected]>
AuthorDate: Thu Feb 20 17:45:05 2025 -0800

    KUDU-3641 fix flaky TestNewLeaderCantResolvePeers (take 2)
    
    I noticed that even after [1] the TestNewLeaderCantResolvePeers was
    still failing in about once in 60 runs [2], so I took a closer look.
    
    It turns out StartElection() doesn't trigger a re-election if it
    arriving at the current Raft leader.  To address that, this patch
    replaces StartElection() back with LeaderStepDown() but with new
    target leader being the tablet replica at the third tablet server.
    
    This is a follow-up to [1].
    
    [1] https://github.com/apache/kudu/commit/6c77ec875
    [2] 
http://dist-test.cloudera.org:8080/test_drilldown?test_name=raft_consensus_election-itest
    
    Change-Id: I3bee924353079f7c8bfab6d0d5a6367bd1ee243e
    Reviewed-on: http://gerrit.cloudera.org:8080/22516
    Tested-by: Alexey Serbin <[email protected]>
    Reviewed-by: Yifan Zhang <[email protected]>
---
 src/kudu/integration-tests/raft_consensus_election-itest.cc | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/src/kudu/integration-tests/raft_consensus_election-itest.cc 
b/src/kudu/integration-tests/raft_consensus_election-itest.cc
index 06478a0b3..5ccdb61e1 100644
--- a/src/kudu/integration-tests/raft_consensus_election-itest.cc
+++ b/src/kudu/integration-tests/raft_consensus_election-itest.cc
@@ -210,6 +210,8 @@ TEST_F(RaftConsensusElectionITest, 
TestNewLeaderCantResolvePeers) {
   const auto& tablet_id = tablet_ids[0];
   const auto bad_ts_uuid = ts_iter->second->uuid();
   const auto* second_ts = (++ts_iter)->second;
+  ASSERT_NE(tablet_servers_.end(), ts_iter);
+  const auto* third_ts = (++ts_iter)->second;
 
   // Start failing DNS resolver queries for the selected tablet server during
   // leader election.
@@ -272,8 +274,9 @@ TEST_F(RaftConsensusElectionITest, 
TestNewLeaderCantResolvePeers) {
   }
   // Cause an election again to trigger a new report to the master. This time
   // the master should place the replica since it has a new tserver available.
-  ASSERT_OK(StartElection(second_ts, tablet_id, kTimeout));
-  ASSERT_OK(WaitUntilLeader(second_ts, tablet_id, kTimeout));
+  ASSERT_OK(LeaderStepDown(
+      second_ts, tablet_id, kTimeout, /*error=*/nullptr, third_ts->uuid()));
+  ASSERT_OK(WaitUntilLeader(third_ts, tablet_id, kTimeout));
   NO_FATALS(cluster_->AssertNoCrashes());
 
   STLDeleteValues(&tablet_servers_);

Reply via email to