Repository: kudu
Updated Branches:
  refs/heads/master b315d0ed0 -> 7b6cc8459


test: deflake TestRestartWithDifferentUUID

The test tried to verify the eventual re-replication of failed tablets.
This wouldn't always work because, given the large number of tablets per
tablet server, the servers would be too busy with handling delete tablet
RPCs to begin re-replicating replicas. This led to a very flaky test
(75/100 failures in TSAN).

This patch reduces the number of tablets and updates the number of
tablet servers to ensure tablet copies can begin in a timely manner on
the restarted tablet server. This passed 1000/1000 in TSAN.

Change-Id: Ice109fe9073e53f1651b30dced200f2cf12e7249
Reviewed-on: http://gerrit.cloudera.org:8080/9190
Tested-by: Kudu Jenkins
Reviewed-by: David Ribeiro Alves <davidral...@gmail.com>


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/07834678
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/07834678
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/07834678

Branch: refs/heads/master
Commit: 078346787f0a7b90c0b9b85664b609b4aa96eab7
Parents: b315d0e
Author: Andrew Wong <aw...@cloudera.com>
Authored: Thu Feb 1 20:35:24 2018 -0800
Committer: David Ribeiro Alves <davidral...@gmail.com>
Committed: Fri Feb 2 07:24:52 2018 +0000

----------------------------------------------------------------------
 src/kudu/integration-tests/raft_consensus-itest.cc | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kudu/blob/07834678/src/kudu/integration-tests/raft_consensus-itest.cc
----------------------------------------------------------------------
diff --git a/src/kudu/integration-tests/raft_consensus-itest.cc 
b/src/kudu/integration-tests/raft_consensus-itest.cc
index 4486a63..7bd4f16 100644
--- a/src/kudu/integration-tests/raft_consensus-itest.cc
+++ b/src/kudu/integration-tests/raft_consensus-itest.cc
@@ -2753,19 +2753,19 @@ TEST_F(RaftConsensusITest, TestLogIOErrorIsFatal) {
 // re-replicated.
 TEST_P(RaftConsensusParamReplicationModesITest, TestRestartWithDifferentUUID) {
   // Start a cluster and insert data.
-  const bool prepare_replacement_before_eviction = GetParam();
+  const bool kPrepareReplacementBeforeEviction = GetParam();
   ExternalMiniClusterOptions opts;
-  opts.num_tablet_servers = 5;
+  opts.num_tablet_servers = kPrepareReplacementBeforeEviction ? 4 : 3;
   opts.extra_tserver_flags = {
     // Set a low timeout. If we can't re-replicate in a reasonable amount of
     // time, it means we're not evicting at all.
     "--follower_unavailable_considered_failed_sec=10",
     Substitute("--raft_prepare_replacement_before_eviction=$0",
-               prepare_replacement_before_eviction),
+               kPrepareReplacementBeforeEviction),
   };
   opts.extra_master_flags = {
     Substitute("--raft_prepare_replacement_before_eviction=$0",
-               prepare_replacement_before_eviction),
+               kPrepareReplacementBeforeEviction),
   };
   cluster_.reset(new ExternalMiniCluster(std::move(opts)));
   ASSERT_OK(cluster_->Start());
@@ -2773,7 +2773,7 @@ TEST_P(RaftConsensusParamReplicationModesITest, 
TestRestartWithDifferentUUID) {
   // Write some data. In writing many tablets, we're making it more likely that
   // all tablet servers will have some tablets on them.
   TestWorkload writes(cluster_.get());
-  writes.set_num_tablets(25);
+  writes.set_num_tablets(5);
   writes.set_timeout_allowed(true);
   writes.Setup();
   writes.Start();

Reply via email to