[ 
https://issues.apache.org/jira/browse/IMPALA-7517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved IMPALA-7517.
---------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 3.1.0

> Hung scanner when soft memory limit exceeded
> --------------------------------------------
>
>                 Key: IMPALA-7517
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7517
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 3.1.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: Impala 3.1.0
>
>
> As reported on the mailing list, this is a regression due to IMPALA-7096 
> (7ccf7369085aa49a8fc0daf6f91d97b8a3135682). The scanner thread has the 
> following code:
> {code}
>    // Stop extra threads if we're over a soft limit in order to free up 
> memory.
>     if (!first_thread && mem_tracker_->AnyLimitExceeded(MemLimit::SOFT)) {
>       break;
>     }
>  
>     // Done with range and it completed successfully
>     if (progress_.done()) {
>       // All ranges are finished.  Indicate we are done.
>       SetDone();
>       break;
>     }
>  
>     if (scan_range == nullptr && num_unqueued_files == 0) {
>       unique_lock<mutex> l(lock_);
>       // All ranges have been queued and DiskIoMgr has no more new ranges for 
> this scan
>       // node to process. This means that every range is either done or being 
> processed by
>       // another thread.
>       all_ranges_started_ = true;
>       break;
>     }
>   }
> {code}
>  
> What if we have the following scenario:
>   
>  T1) grab scan range 1 and start processing
>   
>  T2) grab scan range 2 and start processing
>   
>  T1) finish scan range 1 and see that 'progress_' is not done()
>  T1) loop around, get no scan range (there are no more), so set 
> all_ranges_satrted_ and break
>  T1) thread exits
>   
>  T2) finish scan range 2
>  T2) happen to hit a soft memory limit error due to pressure from other exec 
> nodes, etc. Since we aren't the first thread, we break. (even though the 
> first thread is no longer running)
>  T2) thread exits
>   
>  Note that no one got to the point of calling SetDone() because we break due 
> to the memory limit error _before_ checking progress_.Done().
>   
>  Thus, the query will hang forever.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to