[
https://issues.apache.org/jira/browse/IMPALA-7517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Todd Lipcon resolved IMPALA-7517.
---------------------------------
Resolution: Fixed
Fix Version/s: Impala 3.1.0
> Hung scanner when soft memory limit exceeded
> --------------------------------------------
>
> Key: IMPALA-7517
> URL: https://issues.apache.org/jira/browse/IMPALA-7517
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Affects Versions: Impala 3.1.0
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Priority: Critical
> Fix For: Impala 3.1.0
>
>
> As reported on the mailing list, this is a regression due to IMPALA-7096
> (7ccf7369085aa49a8fc0daf6f91d97b8a3135682). The scanner thread has the
> following code:
> {code}
> // Stop extra threads if we're over a soft limit in order to free up
> memory.
> if (!first_thread && mem_tracker_->AnyLimitExceeded(MemLimit::SOFT)) {
> break;
> }
>
> // Done with range and it completed successfully
> if (progress_.done()) {
> // All ranges are finished. Indicate we are done.
> SetDone();
> break;
> }
>
> if (scan_range == nullptr && num_unqueued_files == 0) {
> unique_lock<mutex> l(lock_);
> // All ranges have been queued and DiskIoMgr has no more new ranges for
> this scan
> // node to process. This means that every range is either done or being
> processed by
> // another thread.
> all_ranges_started_ = true;
> break;
> }
> }
> {code}
>
> What if we have the following scenario:
>
> T1) grab scan range 1 and start processing
>
> T2) grab scan range 2 and start processing
>
> T1) finish scan range 1 and see that 'progress_' is not done()
> T1) loop around, get no scan range (there are no more), so set
> all_ranges_satrted_ and break
> T1) thread exits
>
> T2) finish scan range 2
> T2) happen to hit a soft memory limit error due to pressure from other exec
> nodes, etc. Since we aren't the first thread, we break. (even though the
> first thread is no longer running)
> T2) thread exits
>
> Note that no one got to the point of calling SetDone() because we break due
> to the memory limit error _before_ checking progress_.Done().
>
> Thus, the query will hang forever.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]