[2/4] kudu git commit: delete_table-test: fix flakiness with table creation timeout

todd Wed, 05 Oct 2016 14:30:13 -0700

delete_table-test: fix flakiness with table creation timeout

This test was timing out frequently when trying to create a
replication-2 table on a cluster with 3 tservers, one of which was
recently shut down. The master could try to place a replica on the
non-running server, which would then take some time to time out and try
a new placement.


The workaround here is to restart the master so it no longer sees the
crashed server as a valid placement option.

Change-Id: Ic61ad384e1b247f83bfc709528c4c7bda586c9d2
Reviewed-on: http://gerrit.cloudera.org:8080/4632
Reviewed-by: David Ribeiro Alves <[email protected]>
Reviewed-by: Dinesh Bhat <[email protected]>
Tested-by: Kudu Jenkins


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/98f42cdd
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/98f42cdd
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/98f42cdd

Branch: refs/heads/master
Commit: 98f42cdd878caa429377625a2288d22ed0d114f2
Parents: 0f99d40
Author: Todd Lipcon <[email protected]>
Authored: Wed Oct 5 10:52:29 2016 -0700
Committer: David Ribeiro Alves <[email protected]>
Committed: Wed Oct 5 20:26:40 2016 +0000

----------------------------------------------------------------------
 src/kudu/integration-tests/delete_table-test.cc | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kudu/blob/98f42cdd/src/kudu/integration-tests/delete_table-test.cc
----------------------------------------------------------------------
diff --git a/src/kudu/integration-tests/delete_table-test.cc 
b/src/kudu/integration-tests/delete_table-test.cc
index 6a0de2f..d331d43 100644
--- a/src/kudu/integration-tests/delete_table-test.cc
+++ b/src/kudu/integration-tests/delete_table-test.cc
@@ -432,7 +432,7 @@ TEST_F(DeleteTableTest, 
TestAutoTombstoneAfterCrashDuringTabletCopy) {
   ASSERT_OK(cluster_->master()->Restart());
   ASSERT_OK(cluster_->WaitForTabletServerCount(1, MonoDelta::FromSeconds(30)));
 
-  // Set up a table which has a table only on TS 0. This will be used to test 
for
+  // Set up a table which has a tablet only on TS 0. This will be used to test 
for
   // "collateral damage" bugs where incorrect handling of the main test tablet
   // accidentally removes blocks from another tablet.
   // We use a sequential workload so that we just flush and don't compact.
@@ -467,7 +467,15 @@ TEST_F(DeleteTableTest, 
TestAutoTombstoneAfterCrashDuringTabletCopy) {
   ASSERT_OK(cluster_->tablet_server(2)->Restart());
   cluster_->tablet_server(kTsIndex)->Shutdown();
 
-  // Create a new tablet which is replicated on the other two servers.
+  // Restart the master to be sure that it only sees the live servers.
+  // Otherwise it may try to create a tablet with a replica on the down server.
+  // The table creation would eventually succeed after picking a different set 
of
+  // replicas, but not before causing a timeout.
+  cluster_->master()->Shutdown();
+  ASSERT_OK(cluster_->master()->Restart());
+  ASSERT_OK(cluster_->WaitForTabletServerCount(2, MonoDelta::FromSeconds(30)));
+
+  // Create a new table with a single tablet replicated on the other two 
servers.
   // We use the same sequential workload. This produces block ID sequences
   // that look like:
   //   TS 0: |---- blocks from 'other-table' ---]

[2/4] kudu git commit: delete_table-test: fix flakiness with table creation timeout

Reply via email to