[
https://issues.apache.org/jira/browse/FLINK-14815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16978494#comment-16978494
]
Piotr Nowojski commented on FLINK-14815:
----------------------------------------
{quote}
For the pool usages, in the case of data skew, although the average is very
low, the status of a task is not good.
{quote}
I'm assuming that we will be presenting to the user flag "isBackpressured"
somewhere somehow, for example by changing a colour of the job's vertex to red,
so the "not good" status would be visible to a user, even if the average pool
usage is low. So reporting max of {{outPoolUsage}} is kind of redundant to the
back-pressure status. If at least one subtask is back-pressured,
{{max(outPoolUsage) ~= 100%}}, otherwise {{max(outPoolUsage) ~= 0}}.
Reporting average on the other hand gives extra information. If at least one
subtask is back-pressured, {{average(outPoolUsage)}} can tell you whether this
back-pressure is affecting all or a fraction of sub tasks.
> Expose network pool usage in IOMetricsInfo
> ------------------------------------------
>
> Key: FLINK-14815
> URL: https://issues.apache.org/jira/browse/FLINK-14815
> Project: Flink
> Issue Type: Sub-task
> Components: Runtime / Metrics, Runtime / Network, Runtime / REST
> Reporter: lining
> Assignee: lining
> Priority: Major
>
> * If sub task is not back pressured, but it is causing a back pressure (full
> input, empty output)
> * By comparing exclusive/floating buffers usage, whether all channels are
> back-pressured or only some of them
> {code:java}
> public final class IOMetricsInfo {
> private final float outPoolUsage;
> private final float inputExclusiveBuffersUsage;
> private final float inputFloatingBuffersUsage;
> }
> {code}
> JobDetailsInfo.JobVertexDetailsInfo merge use Math.max.(ps: outPoolUsage is
> from upstream)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)