[ 
https://issues.apache.org/jira/browse/IMPALA-7517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated IMPALA-7517:
--------------------------------
    Description: 
As reported on the mailing list, this is a regression due to IMPALA-7096 
(7ccf7369085aa49a8fc0daf6f91d97b8a3135682). The scanner thread has the 
following code:

{code}
   // Stop extra threads if we're over a soft limit in order to free up memory.
    if (!first_thread && mem_tracker_->AnyLimitExceeded(MemLimit::SOFT)) {
      break;
    }
 
    // Done with range and it completed successfully
    if (progress_.done()) {
      // All ranges are finished.  Indicate we are done.
      SetDone();
      break;
    }
 
    if (scan_range == nullptr && num_unqueued_files == 0) {
      unique_lock<mutex> l(lock_);
      // All ranges have been queued and DiskIoMgr has no more new ranges for 
this scan
      // node to process. This means that every range is either done or being 
processed by
      // another thread.
      all_ranges_started_ = true;
      break;
    }
  }
{code}
 
What if we have the following scenario:
  
 T1) grab scan range 1 and start processing
  
 T2) grab scan range 2 and start processing
  
 T1) finish scan range 1 and see that 'progress_' is not done()
 T1) loop around, get no scan range (there are no more), so set 
all_ranges_satrted_ and break
 T1) thread exits
  
 T2) finish scan range 2
 T2) happen to hit a soft memory limit error due to pressure from other exec 
nodes, etc. Since we aren't the first thread, we break. (even though the first 
thread is no longer running)
 T2) thread exits
  
 Note that no one got to the point of calling SetDone() because we break due to 
the memory limit error _before_ checking progress_.Done().
  
 Thus, the query will hang forever.

  was:
As reported on the mailing list, this is a regression due to IMPALA-7096 
(7ccf7369085aa49a8fc0daf6f91d97b8a3135682). The scanner thread has the 
following code:
 
    // Stop extra threads if we're over a soft limit in order to free up memory.
    if (!first_thread && mem_tracker_->AnyLimitExceeded(MemLimit::SOFT)) {
      break;
    }
 
    // Done with range and it completed successfully
    if (progress_.done()) {
      // All ranges are finished.  Indicate we are done.
      SetDone();
      break;
    }
 
    if (scan_range == nullptr && num_unqueued_files == 0) {
      unique_lock<mutex> l(lock_);
      // All ranges have been queued and DiskIoMgr has no more new ranges for 
this scan
      // node to process. This means that every range is either done or being 
processed by
      // another thread.
      all_ranges_started_ = true;
      break;
    }
  }
 
What if we have the following scenario:
 
T1) grab scan range 1 and start processing
 
T2) grab scan range 2 and start processing
 
T1) finish scan range 1 and see that 'progress_' is not done()
T1) loop around, get no scan range (there are no more), so set 
all_ranges_satrted_ and break
T1) thread exits
 
T2) finish scan range 2
T2) happen to hit a soft memory limit error due to pressure from other exec 
nodes, etc. Since we aren't the first thread, we break. (even though the first 
thread is no longer running)
T2) thread exits
 
Note that no one got to the point of calling SetDone() because we break due to 
the memory limit error _before_ checking progress_.Done().
 
Thus, the query will hang forever.


> Hung scanner when soft memory limit exceeded
> --------------------------------------------
>
>                 Key: IMPALA-7517
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7517
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 3.1.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>
> As reported on the mailing list, this is a regression due to IMPALA-7096 
> (7ccf7369085aa49a8fc0daf6f91d97b8a3135682). The scanner thread has the 
> following code:
> {code}
>    // Stop extra threads if we're over a soft limit in order to free up 
> memory.
>     if (!first_thread && mem_tracker_->AnyLimitExceeded(MemLimit::SOFT)) {
>       break;
>     }
>  
>     // Done with range and it completed successfully
>     if (progress_.done()) {
>       // All ranges are finished.  Indicate we are done.
>       SetDone();
>       break;
>     }
>  
>     if (scan_range == nullptr && num_unqueued_files == 0) {
>       unique_lock<mutex> l(lock_);
>       // All ranges have been queued and DiskIoMgr has no more new ranges for 
> this scan
>       // node to process. This means that every range is either done or being 
> processed by
>       // another thread.
>       all_ranges_started_ = true;
>       break;
>     }
>   }
> {code}
>  
> What if we have the following scenario:
>   
>  T1) grab scan range 1 and start processing
>   
>  T2) grab scan range 2 and start processing
>   
>  T1) finish scan range 1 and see that 'progress_' is not done()
>  T1) loop around, get no scan range (there are no more), so set 
> all_ranges_satrted_ and break
>  T1) thread exits
>   
>  T2) finish scan range 2
>  T2) happen to hit a soft memory limit error due to pressure from other exec 
> nodes, etc. Since we aren't the first thread, we break. (even though the 
> first thread is no longer running)
>  T2) thread exits
>   
>  Note that no one got to the point of calling SetDone() because we break due 
> to the memory limit error _before_ checking progress_.Done().
>   
>  Thus, the query will hang forever.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to