[ 
https://issues.apache.org/jira/browse/IMPALA-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho resolved IMPALA-6685.
--------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 2.12.0
                   Impala 3.0

> Improve profile in KrpcDataStreamRecvr and KrpcDataStreamSender
> ---------------------------------------------------------------
>
>                 Key: IMPALA-6685
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6685
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Distributed Exec
>    Affects Versions: Impala 3.0, Impala 2.12.0
>            Reporter: Michael Ho
>            Assignee: Michael Ho
>            Priority: Major
>              Labels: observability
>             Fix For: Impala 3.0, Impala 2.12.0
>
>
> The existing profiles in KrpcDataStreamRecvr and KrpcDataStreamSender made it 
> hard to diagnose slow queries shown in IMPALA-6657. In particular, there are 
> times in which the profile of the receiver showing a lot of time waiting for 
> row batches to arrive while the sender is also showing a lot of time waiting 
> for responses of TransmitData() RPC. 
> A couple of improvements can be done to make it slightly easier to diagnose 
> the problem:
> - track the number of deferred row batches over time in KrpcDataStreamRecvr
> - track the number of bytes dequeued over time in KrpcDataStreamRecvr
> - track the amount of time row batches spent in deferred queue
> - track the number of bytes sent from KrpcDataStreamSender over time
> The above items help identify cases in which one fragment instances 
> containing an exchange node is slow for a period of time (e.g. the parent of 
> exchange node spills heavily), causing all senders to that fragment instance 
> to block waiting for responses. As all senders are blocked waiting for 
> previous RPC to complete, they will not produce more rows and all other 
> fragment instances will be starved, leading to the high wait time shown in 
> their receiver's profile. The time series counter for the number of deferred 
> row batches in a receiver helps identify cases described above.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to