[
https://issues.apache.org/jira/browse/FLINK-14712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
lining updated FLINK-14712:
---------------------------
Description:
h4. (1) The current monitor is heavy-weight.
* Backpressure monitoring works by repeatedly taking stack trace samples of
your running tasks.
h4. (2) It is difficult to find out which vertex is the source of
backpressure.
* User need to know current and upstream's network metric to judge current
whether is the source of backpressure. Now user has to record relevant
information.
h3. Proposed Changes
1. expose the new mechanism implemented in FLINK-14472 as a "is back-pressured"
metric.
2. show the vertex that produces the backpressure source for the job.
3. expose network pool usage in IOMetricsInfo:
# if sub task is not back pressured, but it is causing a back pressure (full
input, empty output)
# by comparing exclusive/floating buffers usage, whether all channels are
back-pressured or only some of them
{code:java}
public final class IOMetricsInfo {
private final float outPoolUsage;
private final float inputExclusiveBuffersUsage;
private final float inputFloatingBuffersUsage;
}
{code}
JobDetailsInfo.JobVertexDetailsInfo merge use Math.max.(ps: outPoolUsage is
from upstream)
was:
h4. (1) The current monitor is heavy-weight.
* Backpressure monitoring works by repeatedly taking stack trace samples of
your running tasks.
h4. (2) It is difficult to find out which vertex is the source of
backpressure.
* User need to know current and upstream's network metric to judge current
whether is the source of backpressure. Now user has to record relevant
information.
h3. Proposed Changes
1. expose the new mechanism implemented in FLINK-14472 as a "is back-pressured"
metric.
2. show the vertex that produces the backpressure source for the job.
3. expose n IOMetricsInfo:
# if sub task is not back pressured, but it is causing a back pressure (full
input, empty output)
# by comparing exclusive/floating buffers usage, whether all channels are
back-pressured or only some of them
{code:java}
public final class IOMetricsInfo {
private final float outPoolUsage;
private final float inputExclusiveBuffersUsage;
private final float inputFloatingBuffersUsage;
}
{code}
JobDetailsInfo.JobVertexDetailsInfo merge use Math.max.(ps: outPoolUsage is
from upstream)
> Improve back-pressure reporting mechanism
> -----------------------------------------
>
> Key: FLINK-14712
> URL: https://issues.apache.org/jira/browse/FLINK-14712
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / Metrics, Runtime / Network, Runtime / REST
> Reporter: lining
> Assignee: lining
> Priority: Major
> Attachments: image-2019-11-12-14-30-16-130.png
>
>
> h4. (1) The current monitor is heavy-weight.
> * Backpressure monitoring works by repeatedly taking stack trace samples
> of your running tasks.
> h4. (2) It is difficult to find out which vertex is the source of
> backpressure.
> * User need to know current and upstream's network metric to judge current
> whether is the source of backpressure. Now user has to record relevant
> information.
> h3. Proposed Changes
> 1. expose the new mechanism implemented in FLINK-14472 as a "is
> back-pressured" metric.
> 2. show the vertex that produces the backpressure source for the job.
> 3. expose network pool usage in IOMetricsInfo:
> # if sub task is not back pressured, but it is causing a back pressure (full
> input, empty output)
> # by comparing exclusive/floating buffers usage, whether all channels are
> back-pressured or only some of them
> {code:java}
> public final class IOMetricsInfo {
> private final float outPoolUsage;
> private final float inputExclusiveBuffersUsage;
> private final float inputFloatingBuffersUsage;
> }
> {code}
> JobDetailsInfo.JobVertexDetailsInfo merge use Math.max.(ps: outPoolUsage is
> from upstream)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)