[
https://issues.apache.org/jira/browse/FLINK-29615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhanghao Chen updated FLINK-29615:
----------------------------------
Description:
We are exploring autoscaling Flink with Reactive mode using metrics from Flink
REST for guidance, and found that the metrics are not correctly updated.
*Problem*
MetricStore does not remove metrics of nonexistent subtasks when adaptive
scheduler lowers job parallelism (aka, num of subtasks decreases) and users
will see metrics of nonexistent subtasks on Web UI (e.g. the task backpressure
page) or REST API response. It causes confusion and occupies extra memory.
*Proposed Solution*
Thanks to FLINK-29132 & FLINK-28588, Flink will now update current execution
attempts when updating metrics. Since the active subtask info is included in
the current execution attempt info, we are able to retain active subtasks using
the current execution attempt info.
was:
We are exploring autoscaling Flink with Reactive mode using metrics from Flink
REST for guidance, and found that the metrics are not correctly ** updated.
*Problem*
MetricStore does not remove metrics of nonexistent subtasks when adaptive
scheduler lowers job parallelism (aka, num of subtasks decreases) and users
will see metrics of nonexistent subtasks on Web UI (e.g. the task backpressure
page) or REST API response. It causes confusion and occupies extra memory.
*Proposed Solution*
Thanks to FLINK-29132 & FLINK-28588, Flink will now update current execution
attempts when updating metrics. Since the active subtask info is included in
the current execution attempt info, we are able to retain active subtasks using
the current execution attempt info.
> MetricStore does not remove metrics of nonexistent subtasks when adaptive
> scheduler lowers job parallelism
> ----------------------------------------------------------------------------------------------------------
>
> Key: FLINK-29615
> URL: https://issues.apache.org/jira/browse/FLINK-29615
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Metrics, Runtime / REST
> Affects Versions: 1.15.0, 1.16.0
> Reporter: Zhanghao Chen
> Priority: Major
> Labels: pull-request-available
>
> We are exploring autoscaling Flink with Reactive mode using metrics from
> Flink REST for guidance, and found that the metrics are not correctly updated.
>
> *Problem*
> MetricStore does not remove metrics of nonexistent subtasks when adaptive
> scheduler lowers job parallelism (aka, num of subtasks decreases) and users
> will see metrics of nonexistent subtasks on Web UI (e.g. the task
> backpressure page) or REST API response. It causes confusion and occupies
> extra memory.
>
> *Proposed Solution*
> Thanks to FLINK-29132 & FLINK-28588, Flink will now update current execution
> attempts when updating metrics. Since the active subtask info is included in
> the current execution attempt info, we are able to retain active subtasks
> using the current execution attempt info.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)