IMPALA-3757: Add missing lock in RuntimeProfile::ComputeTimeInProfile We've seen a number of issues that appear to be a race around the RuntimeProfile handling on the coordinator. We don't have a consistent repro and haven't been able to find the root cause of the crashes/hangs, but we have seen a number of stacks from crashes that include the following frames:
... RuntimeProfile::ComputeTimeInProfile RuntimeProfile::UpdateAverage Coordinator::UpdateAverageProfile Coordinator::UpdateFragmentExecStatus ... This patch fixes a case where we iterate over children_ without holding the lock. I can't prove that this causes the hangs but we certainly should fix this regardless. Testing: Passed several exhaustive test runs successfully. Change-Id: Ibb40ebfa2e58ecaa8668d30ad92003fc75483517 Reviewed-on: http://gerrit.cloudera.org:8080/3397 Reviewed-by: Matthew Jacobs <[email protected]> Tested-by: Internal Jenkins Project: http://git-wip-us.apache.org/repos/asf/incubator-impala/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-impala/commit/d75f327a Tree: http://git-wip-us.apache.org/repos/asf/incubator-impala/tree/d75f327a Diff: http://git-wip-us.apache.org/repos/asf/incubator-impala/diff/d75f327a Branch: refs/heads/master Commit: d75f327adda7377300e448cdda12d321da837efe Parents: 59cdec2 Author: Matthew Jacobs <[email protected]> Authored: Thu Jun 16 16:31:48 2016 -0700 Committer: Taras Bobrovytsky <[email protected]> Committed: Thu Jul 14 19:04:44 2016 +0000 ---------------------------------------------------------------------- be/src/util/runtime-profile.cc | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/d75f327a/be/src/util/runtime-profile.cc ---------------------------------------------------------------------- diff --git a/be/src/util/runtime-profile.cc b/be/src/util/runtime-profile.cc index 714180a..ce5e934 100644 --- a/be/src/util/runtime-profile.cc +++ b/be/src/util/runtime-profile.cc @@ -363,8 +363,11 @@ void RuntimeProfile::ComputeTimeInProfile(int64_t total) { local_time_percent_ = ::min(1.0, local_time_percent_) * 100; // Recurse on children - for (int i = 0; i < children_.size(); ++i) { - children_[i].first->ComputeTimeInProfile(total); + { + lock_guard<SpinLock> l(children_lock_); + for (int i = 0; i < children_.size(); ++i) { + children_[i].first->ComputeTimeInProfile(total); + } } }
