Repository: incubator-impala
Updated Branches:
  refs/heads/master 2dcbefc65 -> cde19ab8c


IMPALA-5377: Impala may crash if given a fragment instance while restarting

If an Impala daemon restarts quickly enough so that the statestore does
not detect a failure, coordinators will still issue fragment instances
to that daemon. There's a race in that situation, where the BE service
port was opened before ExecEnv::StartServices() had initialized the
process-wide memtracker; trying to run the finstance would then crash
the process.

This patch inverts the order of ExecEnv::StartServices() and starting
the backend and client servers.

Testing: reproduced by manually injecting sleeps after
be_server->Start() and running a query on the cluster. After the patch,
sleeping *before or after* be_server->Start() did not trigger a crash,
and the query failed as expected if it was executed in that narrow
window.

No automated testing added: this is a hard bug to hit without sleeps and
would need some repeated kill-and-restart loops to have a reasonable
chance of triggering.

Change-Id: I1d53b36304cb86c43e110e10cf76a39976ae3bd5
Reviewed-on: http://gerrit.cloudera.org:8080/7036
Reviewed-by: Dan Hecht <[email protected]>
Tested-by: Impala Public Jenkins


Project: http://git-wip-us.apache.org/repos/asf/incubator-impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-impala/commit/f3fdd4d4
Tree: http://git-wip-us.apache.org/repos/asf/incubator-impala/tree/f3fdd4d4
Diff: http://git-wip-us.apache.org/repos/asf/incubator-impala/diff/f3fdd4d4

Branch: refs/heads/master
Commit: f3fdd4d4a8025bab1d2babe9772252f1703a60ee
Parents: 2dcbefc
Author: Henry Robinson <[email protected]>
Authored: Wed May 31 16:37:02 2017 -0700
Committer: Impala Public Jenkins <[email protected]>
Committed: Fri Jun 2 04:59:25 2017 +0000

----------------------------------------------------------------------
 be/src/service/impalad-main.cc | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/f3fdd4d4/be/src/service/impalad-main.cc
----------------------------------------------------------------------
diff --git a/be/src/service/impalad-main.cc b/be/src/service/impalad-main.cc
index 01e5b55..805e0f2 100644
--- a/be/src/service/impalad-main.cc
+++ b/be/src/service/impalad-main.cc
@@ -89,13 +89,6 @@ int ImpaladMain(int argc, char** argv) {
   ABORT_IF_ERROR(CreateImpalaServer(&exec_env, FLAGS_beeswax_port, 
FLAGS_hs2_port,
       FLAGS_be_port, &beeswax_server, &hs2_server, &be_server, &server));
 
-  ABORT_IF_ERROR(be_server->Start());
-
-  if (FLAGS_is_coordinator) {
-    ABORT_IF_ERROR(beeswax_server->Start());
-    ABORT_IF_ERROR(hs2_server->Start());
-  }
-
   Status status = exec_env.StartServices();
   if (!status.ok()) {
     LOG(ERROR) << "Impalad services did not start correctly, exiting.  Error: "
@@ -103,6 +96,16 @@ int ImpaladMain(int argc, char** argv) {
     ShutdownLogging();
     exit(1);
   }
+
+  DCHECK(exec_env.process_mem_tracker() != nullptr)
+      << "ExecEnv::StartServices() must be called before starting RPC 
services";
+  ABORT_IF_ERROR(be_server->Start());
+
+  if (FLAGS_is_coordinator) {
+    ABORT_IF_ERROR(beeswax_server->Start());
+    ABORT_IF_ERROR(hs2_server->Start());
+  }
+
   ImpaladMetrics::IMPALA_SERVER_READY->set_value(true);
   LOG(INFO) << "Impala has started.";
 

Reply via email to