[jira] [Commented] (IMPALA-5200) Profile timers not updated during long-running sort

Joe McDonnell (JIRA) Wed, 24 Oct 2018 17:23:59 -0700


    [ 
https://issues.apache.org/jira/browse/IMPALA-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16663037#comment-16663037
 ]


Joe McDonnell commented on IMPALA-5200:
---------------------------------------

Regardless of whether we have something equivalent to the counters I described 
previously, we can do a better job of implementing 
RuntimeProfile::ComputeTimeInProfile() to make it more sensible while a query 
is executing. This function should take advantage of the fact that a profile 
includes all of its childrens' time. Specifically, if a profile's total time is 
profile_total_time and its summed childrens' total time is children_total_time, 
then a profile's total time is max(children_total_time, profile_total_time). If 
a profile's total time is 0 because of the SCOPED_TIMER issue described above, 
it can sensibly use the total time of its children. This prevents it from 
reporting the local time percentage as 100% when it is stuck in child code.

A pseudocode recursive version looks something like this (this ignores locking, 
custom local times, and probably other things):
{code:java}
void computeTimeInProfile() {
  // Recurse and process children so that they are up to date
  for child in children_:
    child.computeTimeInProfile();

  // All the children have up-to-date total time. Iterate and sum it.
  children_total_time = 0;
  for child in children_:
    children_total_time += child.total_time();

  profile_total_time = total_time_counter()->value();
  total_time = max(children_total_time, profile_total_time);

  // For profiles that have the SCOPED_TIMER condition described above, this 
condition means that local_time is 0 and local_time_percentage = 0, which is 
more sensible than 100%.
  local_time = total_time - children_total_time;
  local_time_percentage = (100 * local_time) / total_time;
}{code}
If the leaf node is waiting in an indefinite wait (and thus is not updating its 
counters), then this will not help. That's ok, because that is a small subset 
of the issues that we encounter.

> Profile timers not updated during long-running sort
> ---------------------------------------------------
>
>                 Key: IMPALA-5200
>                 URL: https://issues.apache.org/jira/browse/IMPALA-5200
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>    Affects Versions: Impala 2.9.0
>            Reporter: Tim Armstrong
>            Priority: Minor
>              Labels: observability, ramp-up, supportability
>
> If you have a query plan with a long-running sort operation (e.g. minutes), 
> the profile timers are not updated to reflect the time spent in the sort 
> until the sort starts returning rows.
> E.g. this is a summary from a sort query that was running for a few hours 
> (!). The summary was misleading and the "heat map" plan in the debug web UI 
> is misleading - it showed the join as the "hot" operator. It would be ideal 
> if we could somehow at least periodically update the time spent in the 
> operator.
> {code}
> Operator              #Hosts   Avg Time   Max Time    #Rows  Est. #Rows   
> Peak Mem  Est. Peak Mem  Detail                   
> ----------------------------------------------------------------------------------------------------------------------------
> 05:MERGING-EXCHANGE        1    0.000ns    0.000ns        0     635.58M       
>    0        -1.00 B  UNPARTITIONED            
> 03:SORT                    1    0.000ns    0.000ns        0     635.58M   
> 47.86 GB      800.00 MB                           
> 02:HASH JOIN               1    4s859ms    4s859ms  771.02M     635.58M  
> 162.11 MB       16.03 MB  INNER JOIN, BROADCAST    
> |--04:EXCHANGE             1   38.988ms   38.988ms  247.20K     247.20K       
>    0              0  BROADCAST                
> |  01:SCAN HDFS            1    8s089ms    8s089ms  247.20K     247.20K    
> 3.79 MB       32.00 MB  product b     
> 00:SCAN HDFS               1  209.997ms  209.997ms   15.09M     635.58M  
> 185.27 MB      176.00 MB  sales a
> {code}
> http://community.cloudera.com/t5/Interactive-Short-cycle-SQL/Impala-join-query-running-slow



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (IMPALA-5200) Profile timers not updated during long-running sort

Reply via email to