Repository: kudu Updated Branches: refs/heads/master eec20423b -> 240695332
raft_consensus_nonvoter-itest: deflake a bit I saw a failure in ReplicaBehindWalGcThresholdITest.ReplicaReplacement (GetParam() was (1, false)) just after the master was restarted: raft_consensus_nonvoter-itest.cc:2070: Failure Failed Bad status: Service unavailable: Leader not yet ready to serve requests This is odd as there's a WaitForCatalogManager() call in there, so why would a subsequent GetTabletLocations RPC return this ServiceUnavailable? As best I can tell, the only way for this to happen is if the attempt to grab the leadership lock from within the ListTables RPC (sent from WaitForCatalogManager()) returns IllegalState, which it'll do if the UUID in the master's cstate doesn't match the UUID on disk. Perhaps this can happen during a leader master election; maybe the cstate's UUID becomes empty for a little while? If that's true, this should fix the problem by considering IllegalState to be a non-final state and continuing the loop. I couldn't repro this failure, but Alexey managed to do so in a dist-test loop with special latency injection enabled. Without the fix, 93 out of 256 runs failed, and with the fix, none failed. Change-Id: I8192bd669e7e309943ea82718dd715238d520bbd Reviewed-on: http://gerrit.cloudera.org:8080/11918 Reviewed-by: Alexey Serbin <[email protected]> Tested-by: Kudu Jenkins Project: http://git-wip-us.apache.org/repos/asf/kudu/repo Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/24069533 Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/24069533 Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/24069533 Branch: refs/heads/master Commit: 240695332bef5e3e50d4473ad4c19f7b4786bc1b Parents: eec2042 Author: Adar Dembo <[email protected]> Authored: Fri Nov 9 11:03:26 2018 -0800 Committer: Adar Dembo <[email protected]> Committed: Wed Nov 14 23:42:31 2018 +0000 ---------------------------------------------------------------------- .../raft_consensus_nonvoter-itest.cc | 2 +- src/kudu/mini-cluster/external_mini_cluster.cc | 13 ++++++++----- src/kudu/mini-cluster/external_mini_cluster.h | 14 ++++++++++++-- 3 files changed, 21 insertions(+), 8 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/kudu/blob/24069533/src/kudu/integration-tests/raft_consensus_nonvoter-itest.cc ---------------------------------------------------------------------- diff --git a/src/kudu/integration-tests/raft_consensus_nonvoter-itest.cc b/src/kudu/integration-tests/raft_consensus_nonvoter-itest.cc index de29869..2a11642 100644 --- a/src/kudu/integration-tests/raft_consensus_nonvoter-itest.cc +++ b/src/kudu/integration-tests/raft_consensus_nonvoter-itest.cc @@ -2055,7 +2055,7 @@ TEST_P(ReplicaBehindWalGcThresholdITest, ReplicaReplacement) { "--master_add_server_when_underreplicated=true"); master->Shutdown(); ASSERT_OK(master->Restart()); - ASSERT_OK(master->WaitForCatalogManager()); + ASSERT_OK(master->WaitForCatalogManager(ExternalMaster::WAIT_FOR_LEADERSHIP)); } } http://git-wip-us.apache.org/repos/asf/kudu/blob/24069533/src/kudu/mini-cluster/external_mini_cluster.cc ---------------------------------------------------------------------- diff --git a/src/kudu/mini-cluster/external_mini_cluster.cc b/src/kudu/mini-cluster/external_mini_cluster.cc index 4fbeb04..a7e3d6b 100644 --- a/src/kudu/mini-cluster/external_mini_cluster.cc +++ b/src/kudu/mini-cluster/external_mini_cluster.cc @@ -1189,7 +1189,7 @@ Status ExternalMaster::Restart() { return StartProcess(flags); } -Status ExternalMaster::WaitForCatalogManager() { +Status ExternalMaster::WaitForCatalogManager(WaitMode wait_mode) { unique_ptr<MasterServiceProxy> proxy(new MasterServiceProxy( opts_.messenger, bound_rpc_addr(), bound_rpc_addr().host())); Stopwatch sw; @@ -1206,10 +1206,13 @@ Status ExternalMaster::WaitForCatalogManager() { } s = StatusFromPB(resp.error().status()); if (s.IsIllegalState()) { - // This master is not the leader but is otherwise up and running. - break; - } - if (!s.IsServiceUnavailable()) { + if (wait_mode == DONT_WAIT_FOR_LEADERSHIP) { + // This master is not the leader but is otherwise up and running. + break; + } + DCHECK_EQ(wait_mode, WAIT_FOR_LEADERSHIP); + // Continue to the sleep below. + } else if (!s.IsServiceUnavailable()) { // Unexpected error from master. return s; } http://git-wip-us.apache.org/repos/asf/kudu/blob/24069533/src/kudu/mini-cluster/external_mini_cluster.h ---------------------------------------------------------------------- diff --git a/src/kudu/mini-cluster/external_mini_cluster.h b/src/kudu/mini-cluster/external_mini_cluster.h index 9dc2f87..8addfff 100644 --- a/src/kudu/mini-cluster/external_mini_cluster.h +++ b/src/kudu/mini-cluster/external_mini_cluster.h @@ -591,8 +591,18 @@ class ExternalMaster : public ExternalDaemon { virtual Status Restart() override WARN_UNUSED_RESULT; // Blocks until the master's catalog manager is initialized and responding to - // RPCs. - Status WaitForCatalogManager() WARN_UNUSED_RESULT; + // RPCs. If 'wait_mode' is WAIT_FOR_LEADERSHIP, will further block until the + // master has been elected leader. + // + // WAIT_FOR_LEADERSHIP should only be used in single-master environments; the + // wait may time out in a multi-master environment as there's no guarantee + // that this particular master will be elected leader. + enum WaitMode { + WAIT_FOR_LEADERSHIP, + DONT_WAIT_FOR_LEADERSHIP + }; + Status WaitForCatalogManager( + WaitMode wait_mode = DONT_WAIT_FOR_LEADERSHIP) WARN_UNUSED_RESULT; private: std::vector<std::string> GetCommonFlags() const;
