[
https://issues.apache.org/jira/browse/FLINK-14127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Piotr Nowojski closed FLINK-14127.
----------------------------------
Fix Version/s: (was: 1.13.0)
Resolution: Won't Fix
For the time being this won't be implemented, as network bottleneck can not be
detected solely on metrics from a single Task/Sub-task, but one would have to
look at metrics from neighbouring upstream and downstream tasks.
The problem is that currently REST API doesn't know about how task nodes are
connected, since JobGraph is stored in JSON format - it would need to be parsed
(do-able but extra work). Alternatively this logic could be placed in the WebUI
itself, which is already parsing the JSON to a graph (to render it), but that's
also non trivial to do by me (I don't know Java/TypeScript that well) and
frankly, this kind of logic sounds like should be done by the REST API.
TLDR; The current improved bottleneck detection (coloring nodes based on the
backPressured/busy/idle times) detecting network back pressure might be simply
not worth the effort. I'm open to revisit this decision in the future.
> Better BackPressure Detection in WebUI
> --------------------------------------
>
> Key: FLINK-14127
> URL: https://issues.apache.org/jira/browse/FLINK-14127
> Project: Flink
> Issue Type: Sub-task
> Components: Runtime / Web Frontend
> Affects Versions: 1.10.0
> Reporter: Yadong Xie
> Priority: Major
> Attachments: 屏幕快照 2019-09-19 下午6.00.05.png, 屏幕快照 2019-09-19
> 下午6.00.57.png, 屏幕快照 2019-09-19 下午6.01.43.png
>
>
> According to the
> [Document|https://ci.apache.org/projects/flink/flink-docs-release-1.9/monitoring/back_pressure.html],
> the backpressure monitor only triggered on request and it is currently not
> available via metrics. This means that in the web UI we have no way to show
> all the backpressure state of all vertexes at the same time. The users need
> to click every vertex to get its backpressure state.
> !屏幕快照 2019-09-19 下午6.00.05.png|width=510,height=197!
> In Flink 1.9.0 and above, there are four metrics available(outPoolUsage,
> inPoolUsage, floatingBuffersUsage, exclusiveBuffersUsage), we can use these
> metrics to determine if there are possible backpressure, and then use the
> backpressure REST API to confirm it.
> Here is a table get from
> [https://flink.apache.org/2019/07/23/flink-network-stack-2.html]
> !屏幕快照 2019-09-19 下午6.00.57.png|width=516,height=304!
>
> We can display the possible backpressure status on the vertex graph, thus
> users can get all the vertex backpressure states and locate the potential
> problem quickly.
>
> !屏幕快照 2019-09-19 下午6.01.43.png|width=572,height=277!
>
> REST API needed:
> add outPoolUsage, inPoolUsage, floatingBuffersUsage, exclusiveBuffersUsage
> metrics for each vertex in the /jobs/:jobId API
--
This message was sent by Atlassian Jira
(v8.3.4#803005)