[ 
https://issues.apache.org/jira/browse/IMPALA-12984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manish Maheshwari updated IMPALA-12984:
---------------------------------------
    Attachment: Profile with slow data exchanges.txt

> Show inactivity of data exchanges in query profile
> --------------------------------------------------
>
>                 Key: IMPALA-12984
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12984
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Distributed Exec
>            Reporter: Riza Suminto
>            Priority: Major
>         Attachments: Profile with slow data exchanges.txt
>
>
> Many-to-many data exchanges can be bottlenecked by hotspot receiver such 
> scenario described in IMPALA-6692 or when data spilling happens in subset of 
> backend. Ideally, this occurrences should be easily figured out in query 
> profile. But triaging this kind of issue often requires correlation analysis 
> of several counters in query profile. There are few ideas on how to improve 
> this identification:
>  # Upon query completion, let coordinator do some profile analysis and print 
> WARNING in query profile pointing at the skew. One group of EXCHANGE senders 
> and receivers can only complete simultaneously since all receivers need to 
> wait for EOS signal from all senders. Let say we take max of 
> TotalNetworkSendTime from all senders and max of DataWaitTime from all 
> receivers, a "mutual wait" time of min(TotalNetworkSendTime,DataWaitTime) can 
> be used as indicator of how long the exchanges are waiting for query 
> operators above them to progress.
>  # Add "Max Inactive" column in ExecSummary table. Existing "Avg Time" and 
> "Max Time" are derived from RuntimeProfileBase::local_time_ns_. If 
> ExecSummary also display maximum value of RuntimeProfileBase::inactive_timer_ 
> of each query operator as "Max Inactive", we can then compare it against "Max 
> Time" and figure out which exchange is mostly idle waiting. The calculation 
> between local_time_ns, children_total_time, and inactive_timer can be seen 
> here at 
> [https://github.com/apache/impala/blob/0721858/be/src/util/runtime-profile.cc#L935-L938]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to