[
https://issues.apache.org/jira/browse/FLINK-25414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Piotr Nowojski updated FLINK-25414:
-----------------------------------
Description:
Currently back pressured/busy metrics tell the user whether task is
blocked/busy and how much % of the time it is blocked/busy. But they do not
tell how for how long single block event is lasting. It can be 1ms or 1h and
back pressure/busy would be still reporting 100%.
In order to improve this, we could provide two new metrics:
# maxSoftBackPressureTime
# maxHardBackPressureTime
The max would be reset to 0 periodically or on every access to the metric (via
metric reporter). Soft back pressure would be if task is back pressured in a
non blocking fashion (StreamTask detected in availability of the output). Hard
back pressure would measure the time task is actually blocked.
In order to calculate those metrics I'm proposing to split the already existing
backPressuredTimeMsPerSecond into soft and hard versions as well.
Unfortunately I don't know how to efficiently provide similar metric for busy
time, without impacting max throughput.
was:
Currently back pressured/busy metrics tell the user whether task is
blocked/busy and how much % of the time it is blocked/busy. But they do not
tell how for how long single block event is lasting. It can be 1ms or 1h and
back pressure/busy would be still reporting 100%.
In order to improve this, we could provide two new metrics:
# maxSoftBackPressureDuration
# maxHardBackPressureDuration
The max would be reset to 0 periodically or on every access to the metric (via
metric reporter). Soft back pressure would be if task is back pressured in a
non blocking fashion (StreamTask detected in availability of the output). Hard
back pressure would measure the time task is actually blocked.
Unfortunately I don't know how to efficiently provide similar metric for busy
time, without impacting max throughput.
> Provide metrics to measure how long task has been blocked
> ---------------------------------------------------------
>
> Key: FLINK-25414
> URL: https://issues.apache.org/jira/browse/FLINK-25414
> Project: Flink
> Issue Type: New Feature
> Components: Runtime / Metrics, Runtime / Task
> Affects Versions: 1.14.2
> Reporter: Piotr Nowojski
> Assignee: Piotr Nowojski
> Priority: Major
> Labels: pull-request-available
>
> Currently back pressured/busy metrics tell the user whether task is
> blocked/busy and how much % of the time it is blocked/busy. But they do not
> tell how for how long single block event is lasting. It can be 1ms or 1h and
> back pressure/busy would be still reporting 100%.
> In order to improve this, we could provide two new metrics:
> # maxSoftBackPressureTime
> # maxHardBackPressureTime
> The max would be reset to 0 periodically or on every access to the metric
> (via metric reporter). Soft back pressure would be if task is back pressured
> in a non blocking fashion (StreamTask detected in availability of the
> output). Hard back pressure would measure the time task is actually blocked.
> In order to calculate those metrics I'm proposing to split the already
> existing backPressuredTimeMsPerSecond into soft and hard versions as well.
> Unfortunately I don't know how to efficiently provide similar metric for busy
> time, without impacting max throughput.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)