[
https://issues.apache.org/jira/browse/FLINK-14712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17236099#comment-17236099
]
Matthias commented on FLINK-14712:
----------------------------------
Commenting here as I came across this issue as part of the Engine team's
backlog grooming and me observing the subtask FLINK-14814:
[~friendmine][~lining] Can you verify that this is still something worth
working on? We might want to un-assign the issues in case you're busy and,
therefore, unable to proceed with it.
> Improve back-pressure reporting mechanism
> -----------------------------------------
>
> Key: FLINK-14712
> URL: https://issues.apache.org/jira/browse/FLINK-14712
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / Metrics, Runtime / Network, Runtime / REST
> Reporter: lining
> Assignee: lining
> Priority: Major
> Attachments: image-2019-11-12-14-30-16-130.png
>
>
> h4. (1) The current monitor is heavy-weight.
> * Backpressure monitoring works by repeatedly taking stack trace samples
> of your running tasks.
> h4. (2) It is difficult to find out which vertex is the source of
> backpressure.
> * User need to know current and upstream's network metric to judge current
> whether is the source of backpressure. Now user has to record relevant
> information.
> h3. Proposed Changes
> 1. expose the new mechanism implemented in FLINK-14472 as a "is
> back-pressured" metric.
> 2. show the vertex that produces the backpressure source for the job.
> 3. expose network metric in IOMetricsInfo:
> * SubTask
> ** pool usage: outPoolUsage, inputExclusiveBuffersUsage,
> inputFloatingBuffersUsage.
> *** If the subtask is not back pressured, but it is causing backpressure
> (full input, empty output)
> *** By comparing exclusive/floating buffers usage, whether all channels are
> back-pressure or only some of them
> ** back-pressured for show whether it is back pressured.
> * Vertex
> ** pool usage: outPoolUsageAvg, inputExclusiveBuffersUsageAvg,
> inputFloatingBuffersUsageAvg
> ** back-pressured for show whether it is back pressured(merge all iths
> subtasks)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)