Repository: incubator-impala Updated Branches: refs/heads/master 2dcbefc65 -> cde19ab8c
IMPALA-5377: Impala may crash if given a fragment instance while restarting If an Impala daemon restarts quickly enough so that the statestore does not detect a failure, coordinators will still issue fragment instances to that daemon. There's a race in that situation, where the BE service port was opened before ExecEnv::StartServices() had initialized the process-wide memtracker; trying to run the finstance would then crash the process. This patch inverts the order of ExecEnv::StartServices() and starting the backend and client servers. Testing: reproduced by manually injecting sleeps after be_server->Start() and running a query on the cluster. After the patch, sleeping *before or after* be_server->Start() did not trigger a crash, and the query failed as expected if it was executed in that narrow window. No automated testing added: this is a hard bug to hit without sleeps and would need some repeated kill-and-restart loops to have a reasonable chance of triggering. Change-Id: I1d53b36304cb86c43e110e10cf76a39976ae3bd5 Reviewed-on: http://gerrit.cloudera.org:8080/7036 Reviewed-by: Dan Hecht <[email protected]> Tested-by: Impala Public Jenkins Project: http://git-wip-us.apache.org/repos/asf/incubator-impala/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-impala/commit/f3fdd4d4 Tree: http://git-wip-us.apache.org/repos/asf/incubator-impala/tree/f3fdd4d4 Diff: http://git-wip-us.apache.org/repos/asf/incubator-impala/diff/f3fdd4d4 Branch: refs/heads/master Commit: f3fdd4d4a8025bab1d2babe9772252f1703a60ee Parents: 2dcbefc Author: Henry Robinson <[email protected]> Authored: Wed May 31 16:37:02 2017 -0700 Committer: Impala Public Jenkins <[email protected]> Committed: Fri Jun 2 04:59:25 2017 +0000 ---------------------------------------------------------------------- be/src/service/impalad-main.cc | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/f3fdd4d4/be/src/service/impalad-main.cc ---------------------------------------------------------------------- diff --git a/be/src/service/impalad-main.cc b/be/src/service/impalad-main.cc index 01e5b55..805e0f2 100644 --- a/be/src/service/impalad-main.cc +++ b/be/src/service/impalad-main.cc @@ -89,13 +89,6 @@ int ImpaladMain(int argc, char** argv) { ABORT_IF_ERROR(CreateImpalaServer(&exec_env, FLAGS_beeswax_port, FLAGS_hs2_port, FLAGS_be_port, &beeswax_server, &hs2_server, &be_server, &server)); - ABORT_IF_ERROR(be_server->Start()); - - if (FLAGS_is_coordinator) { - ABORT_IF_ERROR(beeswax_server->Start()); - ABORT_IF_ERROR(hs2_server->Start()); - } - Status status = exec_env.StartServices(); if (!status.ok()) { LOG(ERROR) << "Impalad services did not start correctly, exiting. Error: " @@ -103,6 +96,16 @@ int ImpaladMain(int argc, char** argv) { ShutdownLogging(); exit(1); } + + DCHECK(exec_env.process_mem_tracker() != nullptr) + << "ExecEnv::StartServices() must be called before starting RPC services"; + ABORT_IF_ERROR(be_server->Start()); + + if (FLAGS_is_coordinator) { + ABORT_IF_ERROR(beeswax_server->Start()); + ABORT_IF_ERROR(hs2_server->Start()); + } + ImpaladMetrics::IMPALA_SERVER_READY->set_value(true); LOG(INFO) << "Impala has started.";
