[
https://issues.apache.org/jira/browse/FLINK-12576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16938405#comment-16938405
]
David Anderson commented on FLINK-12576:
----------------------------------------
Ok, I see what's going on now, at least to some extent. I see now that the
input queue length metric is behaving as documented.
I wasn't focused on the input queue length metric when I re-opened this ticket
– I was only looking at the inPoolUsage and exclusive and floating buffer
metrics. Is it the case that these metrics are also intended to ignore local
input channels? If so, then I guess the only bug is in the documentation, which
fails to explain this.
> inputQueueLength metric does not work for LocalInputChannels
> ------------------------------------------------------------
>
> Key: FLINK-12576
> URL: https://issues.apache.org/jira/browse/FLINK-12576
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Metrics, Runtime / Network
> Affects Versions: 1.6.4, 1.7.2, 1.8.0, 1.9.0
> Reporter: Piotr Nowojski
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.9.0
>
> Attachments: Screen Shot 2019-09-24 at 3.11.15 PM.png, Screen Shot
> 2019-09-24 at 3.13.05 PM.png, Screen Shot 2019-09-24 at 3.22.36 PM.png,
> Screen Shot 2019-09-24 at 3.22.53 PM.png,
> flink-1.8-2-single-slot-TMs-input.png,
> flink-1.8-2-single-slot-TMs-output.png, flink-1.8-input-subtasks.png,
> flink-1.8-output-subtasks.png, image-2019-09-26-11-34-24-878.png,
> image-2019-09-26-11-36-06-027.png
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> Currently {{inputQueueLength}} ignores LocalInputChannels
> ({{SingleInputGate#getNumberOfQueuedBuffers}}). This can can cause mistakes
> when looking for causes of back pressure (If task is back pressuring whole
> Flink job, but there is a data skew and only local input channels are being
> used).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)