IMPALA-7517. Fix hang in scanner threads when soft limit is exceeded As described in the JIRA, when scanner threads see that the soft limit has been exceeded, they try to shut down. In some particular interleavings, this would cause all of the scanner threads to exit without any of them marking the scan as completed.
This patch adds a new fault point to inject fake soft limit errors, and adds this fault point to the scanner test. With the previous placement of the soft limit check, this caused query hangs pretty reliably. With the new placement of the memory limit check, it now passes. Change-Id: I3dc1a2ec79c823575d7d40e7b52216dea5b0ddde Reviewed-on: http://gerrit.cloudera.org:8080/11369 Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Reviewed-by: Todd Lipcon <t...@apache.org> Project: http://git-wip-us.apache.org/repos/asf/impala/repo Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/13e93e75 Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/13e93e75 Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/13e93e75 Branch: refs/heads/master Commit: 13e93e75bf1cab44f80ceee17b0b0abde8ccd034 Parents: fa0869d Author: Todd Lipcon <t...@apache.org> Authored: Thu Aug 30 15:54:47 2018 -0700 Committer: Todd Lipcon <t...@apache.org> Committed: Fri Aug 31 17:34:33 2018 +0000 ---------------------------------------------------------------------- be/src/exec/hdfs-scan-node.cc | 14 +++++++++----- tests/query_test/test_scanners.py | 3 +++ 2 files changed, 12 insertions(+), 5 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/impala/blob/13e93e75/be/src/exec/hdfs-scan-node.cc ---------------------------------------------------------------------- diff --git a/be/src/exec/hdfs-scan-node.cc b/be/src/exec/hdfs-scan-node.cc index 3755e50..e8adc56 100644 --- a/be/src/exec/hdfs-scan-node.cc +++ b/be/src/exec/hdfs-scan-node.cc @@ -417,11 +417,6 @@ void HdfsScanNode::ScannerThread(bool first_thread, int64_t scanner_thread_reser break; } - // Stop extra threads if we're over a soft limit in order to free up memory. - if (!first_thread && mem_tracker_->AnyLimitExceeded(MemLimit::SOFT)) { - break; - } - // Done with range and it completed successfully if (progress_.done()) { // All ranges are finished. Indicate we are done. @@ -437,6 +432,15 @@ void HdfsScanNode::ScannerThread(bool first_thread, int64_t scanner_thread_reser all_ranges_started_ = true; break; } + + // Stop extra threads if we're over a soft limit in order to free up memory. + if (!first_thread && + (mem_tracker_->AnyLimitExceeded(MemLimit::SOFT) || + !DebugAction(runtime_state_->query_options(), + "HDFS_SCANNER_THREAD_CHECK_SOFT_MEM_LIMIT").ok())) { + VLOG_QUERY << "Soft memory limit exceeded. Extra scanner thread exiting."; + break; + } } { http://git-wip-us.apache.org/repos/asf/impala/blob/13e93e75/tests/query_test/test_scanners.py ---------------------------------------------------------------------- diff --git a/tests/query_test/test_scanners.py b/tests/query_test/test_scanners.py index c9ad888..b026475 100644 --- a/tests/query_test/test_scanners.py +++ b/tests/query_test/test_scanners.py @@ -64,6 +64,9 @@ DEBUG_ACTION_DIMS = [None, '-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@0.5', '-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0'] +# Trigger injected soft limit failures when scanner threads check memory limit. +DEBUG_ACTION_DIMS.append('HDFS_SCANNER_THREAD_CHECK_SOFT_MEM_LIMIT:FAIL@0.5') + class TestScannersAllTableFormats(ImpalaTestSuite): BATCH_SIZES = [0, 1, 16]