[
https://issues.apache.org/jira/browse/IMPALA-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Ho resolved IMPALA-6685.
--------------------------------
Resolution: Fixed
Fix Version/s: Impala 2.12.0
Impala 3.0
> Improve profile in KrpcDataStreamRecvr and KrpcDataStreamSender
> ---------------------------------------------------------------
>
> Key: IMPALA-6685
> URL: https://issues.apache.org/jira/browse/IMPALA-6685
> Project: IMPALA
> Issue Type: Sub-task
> Components: Distributed Exec
> Affects Versions: Impala 3.0, Impala 2.12.0
> Reporter: Michael Ho
> Assignee: Michael Ho
> Priority: Major
> Labels: observability
> Fix For: Impala 3.0, Impala 2.12.0
>
>
> The existing profiles in KrpcDataStreamRecvr and KrpcDataStreamSender made it
> hard to diagnose slow queries shown in IMPALA-6657. In particular, there are
> times in which the profile of the receiver showing a lot of time waiting for
> row batches to arrive while the sender is also showing a lot of time waiting
> for responses of TransmitData() RPC.
> A couple of improvements can be done to make it slightly easier to diagnose
> the problem:
> - track the number of deferred row batches over time in KrpcDataStreamRecvr
> - track the number of bytes dequeued over time in KrpcDataStreamRecvr
> - track the amount of time row batches spent in deferred queue
> - track the number of bytes sent from KrpcDataStreamSender over time
> The above items help identify cases in which one fragment instances
> containing an exchange node is slow for a period of time (e.g. the parent of
> exchange node spills heavily), causing all senders to that fragment instance
> to block waiting for responses. As all senders are blocked waiting for
> previous RPC to complete, they will not produce more rows and all other
> fragment instances will be starved, leading to the high wait time shown in
> their receiver's profile. The time series counter for the number of deferred
> row batches in a receiver helps identify cases described above.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)