Repository: kudu Updated Branches: refs/heads/master 65c1edaf0 -> 681f05b43
KUDU-2037 fix flake in ts_recovery-itest Fixed flake in TsRecoveryITest.TestRestartWithOrphanedReplicates scenario. The write operation timeout was set to 100ms, and for a ASAN/TSAN build that was under the reasonable minimum to successfully complete the majority of write operations. The issue of bloating the client- and the master-side queue with GetTableLocations() requests will be addressed separately, with a new integration test to cover the specific issue (see below). Prior to KUDU-1034 fix, the client continued to retry the operation to the same tablet server again and again, not invalidating the entry in its meta-cache. After KUDU-1034 fix, the client started marking the tserver as failed and switching to another one, calling GetTableLocations() after every failure since that was the only available tablet server. In the test scenario, the master was not responding fast enough to sustain the rate of adding new entries into the client- and the master-side queues, so eventually the client timed out on the GetTableLocations() calls. As a result, the expected tablet crash hadn't happened because there were too few write operations trigger the crash of the tablet server. Having short write timeout is not essential for the test. Bumping the write operation timeout from 100 to 1000 ms allows for the majority of write operations to succeed even in TSAN/ASAN builds and avoid needless retries on the client side. Change-Id: I6c5449dc9b47062ea9389b25a1b9d906d9de64d9 Reviewed-on: http://gerrit.cloudera.org:8080/7138 Tested-by: Alexey Serbin <[email protected]> Reviewed-by: Todd Lipcon <[email protected]> Project: http://git-wip-us.apache.org/repos/asf/kudu/repo Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/681f05b4 Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/681f05b4 Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/681f05b4 Branch: refs/heads/master Commit: 681f05b431a6fe62370feb439dd0756d9eefe07d Parents: 65c1eda Author: Alexey Serbin <[email protected]> Authored: Thu Jun 8 20:13:14 2017 -0700 Committer: Todd Lipcon <[email protected]> Committed: Sat Jun 10 01:18:20 2017 +0000 ---------------------------------------------------------------------- src/kudu/integration-tests/ts_recovery-itest.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/kudu/blob/681f05b4/src/kudu/integration-tests/ts_recovery-itest.cc ---------------------------------------------------------------------- diff --git a/src/kudu/integration-tests/ts_recovery-itest.cc b/src/kudu/integration-tests/ts_recovery-itest.cc index e996ad1..9391717 100644 --- a/src/kudu/integration-tests/ts_recovery-itest.cc +++ b/src/kudu/integration-tests/ts_recovery-itest.cc @@ -86,7 +86,7 @@ TEST_F(TsRecoveryITest, TestRestartWithOrphanedReplicates) { TestWorkload work(cluster_.get()); work.set_num_replicas(1); work.set_num_write_threads(4); - work.set_write_timeout_millis(100); + work.set_write_timeout_millis(1000); work.set_timeout_allowed(true); work.Setup();
