[
https://issues.apache.org/jira/browse/IMPALA-6025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Armstrong updated IMPALA-6025:
----------------------------------
Component/s: (was: Backend)
> Improve hang diagnostics
> ------------------------
>
> Key: IMPALA-6025
> URL: https://issues.apache.org/jira/browse/IMPALA-6025
> Project: IMPALA
> Issue Type: Epic
> Components: Distributed Exec
> Affects Versions: Impala 2.9.0
> Reporter: Michael Ho
> Assignee: Lars Volker
> Priority: Major
> Labels: observability, supportability
>
> In the past, users of Impalad had a hard time getting diagnostics information
> when a query is hung. Usually, that involves a rather manual process of
> determining the fragment instances which aren't making progress and
> generating stack trace or core from that Impalad and looking into it under a
> debugger. Given the thousand of threads running when multiple queries are
> active, it's quite time consuming for diagnostics.
> This JIRA aims to track the improvement ideas which we can implement to
> alleviate the stress with debugging this kind of issue. Some ideas include:
> - implement a diagnostic button (analogous to the cancellation button in the
> UI) to dump diagnostics information (e.g. threads' backtraces, executor
> nodes' internals, states of data stream sender and receivers, lock
> information (e.g. holder's pid) ) for fragment instances on some or all hosts
> of a query.
> - have a watch dog to dump backtraces on threads which aren't making
> progress for a while. This probably doesn't apply to all threads (e.g. idle
> threads shouldn't trigger any alert).
> - A fragment instance can appear to be not making progress because its parent
> operator / fragment may be hung (e.g.the probe side of a join will not be
> able to make much progress until the build side is done and the build side
> itself could be another chain of joins). It'd be much easier to resolve this
> dependency chain programmatically to find the root of the cascade of delay.
> Please feel free to add more ideas to this JIRA.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]