IMPALA-4274: hang in buffered-block-mgr-test We started seeing hangs in CreateDestroyMulti() where a thread was recursively acquiring static_block_mgrs_lock_. This is only possible because a shared_ptr is destroyed while holding the lock.
The fix is to reset the shared_ptr only after releasing the lock. Testing: I was unable to reproduce the hang locally, but the callstack in the JIRA was a strong enough smoking gun to feel confident that this should fix the hang. Change-Id: I21f3da1d09cdd101a28ee850f46f24acd361b604 Reviewed-on: http://gerrit.cloudera.org:8080/4690 Reviewed-by: Alex Behm <[email protected]> Reviewed-by: Marcel Kornacker <[email protected]> Tested-by: Internal Jenkins Project: http://git-wip-us.apache.org/repos/asf/incubator-impala/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-impala/commit/b28baa4a Tree: http://git-wip-us.apache.org/repos/asf/incubator-impala/tree/b28baa4a Diff: http://git-wip-us.apache.org/repos/asf/incubator-impala/diff/b28baa4a Branch: refs/heads/master Commit: b28baa4a038e78b7d3ba26c50b78d9b28cf901a9 Parents: 3e23e40 Author: Tim Armstrong <[email protected]> Authored: Tue Oct 11 17:37:38 2016 -0700 Committer: Internal Jenkins <[email protected]> Committed: Wed Oct 12 08:37:30 2016 +0000 ---------------------------------------------------------------------- be/src/runtime/buffered-block-mgr.cc | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/b28baa4a/be/src/runtime/buffered-block-mgr.cc ---------------------------------------------------------------------- diff --git a/be/src/runtime/buffered-block-mgr.cc b/be/src/runtime/buffered-block-mgr.cc index f45a93b..f1d947e 100644 --- a/be/src/runtime/buffered-block-mgr.cc +++ b/be/src/runtime/buffered-block-mgr.cc @@ -524,6 +524,7 @@ Status BufferedBlockMgr::TransferBuffer(Block* dst, Block* src, bool unpin) { } BufferedBlockMgr::~BufferedBlockMgr() { + shared_ptr<BufferedBlockMgr> other_mgr_ptr; { lock_guard<SpinLock> lock(static_block_mgrs_lock_); BlockMgrsMap::iterator it = query_to_block_mgrs_.find(query_id_); @@ -536,16 +537,19 @@ BufferedBlockMgr::~BufferedBlockMgr() { // distinguish between the two expired pointers), and when the other // ~BufferedBlockMgr() call occurs, it won't find an entry for this query_id_. if (it != query_to_block_mgrs_.end()) { - shared_ptr<BufferedBlockMgr> mgr = it->second.lock(); - if (mgr.get() == NULL) { + other_mgr_ptr = it->second.lock(); + if (other_mgr_ptr.get() == NULL) { // The BufferBlockMgr object referenced by this entry is being deconstructed. query_to_block_mgrs_.erase(it); } else { // The map references another (still valid) BufferedBlockMgr. - DCHECK_NE(mgr.get(), this); + DCHECK_NE(other_mgr_ptr.get(), this); } } } + // IMPALA-4274: releasing the reference count can recursively call ~BufferedBlockMgr(). + // Do not do that with 'static_block_mgrs_lock_' held. + other_mgr_ptr.reset(); if (io_request_context_ != NULL) io_mgr_->UnregisterContext(io_request_context_);
