[
https://issues.apache.org/jira/browse/FLINK-29615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620796#comment-17620796
]
Xintong Song commented on FLINK-29615:
--------------------------------------
Thanks for reporting and fixing this, [~Zhanghao Chen]. I'll take a look at the
PR.
> MetricStore does not remove metrics of nonexistent subtasks when adaptive
> scheduler lowers job parallelism
> ----------------------------------------------------------------------------------------------------------
>
> Key: FLINK-29615
> URL: https://issues.apache.org/jira/browse/FLINK-29615
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Metrics, Runtime / REST
> Affects Versions: 1.15.0, 1.16.0
> Reporter: Zhanghao Chen
> Priority: Major
> Labels: pull-request-available
>
> We are exploring autoscaling Flink with Reactive mode using metrics from
> Flink REST for guidance, and found that the metrics are not correctly updated.
>
> *Problem*
> MetricStore does not remove metrics of nonexistent subtasks when adaptive
> scheduler lowers job parallelism (aka, num of subtasks decreases) and users
> will see metrics of nonexistent subtasks on Web UI (e.g. the task
> backpressure page) or REST API response. It causes confusion and occupies
> extra memory.
>
> *Proposed Solution*
> Thanks to FLINK-29132 & FLINK-28588, Flink will now update current execution
> attempts when updating metrics. Since the active subtask info is included in
> the current execution attempt info, we are able to retain active subtasks
> using the current execution attempt info.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)