[
https://issues.apache.org/jira/browse/IMPALA-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Armstrong updated IMPALA-5473:
----------------------------------
Component/s: Distributed Exec
> Make diagnosing network issues easier
> -------------------------------------
>
> Key: IMPALA-5473
> URL: https://issues.apache.org/jira/browse/IMPALA-5473
> Project: IMPALA
> Issue Type: Task
> Components: Distributed Exec
> Affects Versions: Impala 2.10.0
> Reporter: Henry Robinson
> Priority: Major
> Labels: observability, supportability
>
> With our current metrics in the profile, it's hard to debug queries that get
> slow throughput from their exchanges.
> The following cases have different causes, but similar symptoms (e.g. a high
> {{InactiveTimer}} in the xchg profile):
> 1. Downstream sender does not produce rows quickly (perhaps because *its*
> child instances do not produce rows quickly).
> 2. Downstream sender can not _send_ rows quickly, perhaps because of network
> congestion.
> 3. Downstream sender does not start producing rows until some time after the
> upstream has started (captured by {{FirstBatchArrivalWaitTime}}).
> 4. Downstream sender does not close stream until some time after all rows are
> sent.
> We should try to improve these metrics so that all the information about who
> is slow, and why, is available clearly in the runtime profile. Distinguishing
> cases 1 and 2 is particularly important.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]