[ 
https://issues.apache.org/jira/browse/FLINK-29615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhanghao Chen updated FLINK-29615:
----------------------------------
    Description: 
We are exploring autoscaling Flink with Reactive mode using metrics from Flink 
REST for guidance, and found that the metrics are not correctly updated.

 

*Problem*

MetricStore does not remove metrics of nonexistent subtasks when adaptive 
scheduler lowers job parallelism (aka, num of subtasks decreases) and users 
will see metrics of nonexistent subtasks on Web UI (e.g. the task backpressure 
page) or REST API response. It causes confusion and occupies extra memory.

 

*Proposed Solution*

Thanks to FLINK-29132 & FLINK-28588,  Flink will now update current execution 
attempts when updating metrics. Since the active subtask info is included in 
the current execution attempt info, we are able to retain active subtasks using 
the current execution attempt info.

 

  was:
We are exploring autoscaling Flink with Reactive mode using metrics from Flink 
REST for guidance, and found that the metrics are not correctly ** updated.

 

*Problem*

MetricStore does not remove metrics of nonexistent subtasks when adaptive 
scheduler lowers job parallelism (aka, num of subtasks decreases) and users 
will see metrics of nonexistent subtasks on Web UI (e.g. the task backpressure 
page) or REST API response. It causes confusion and occupies extra memory.

 

*Proposed Solution*

Thanks to FLINK-29132 & FLINK-28588,  Flink will now update current execution 
attempts when updating metrics. Since the active subtask info is included in 
the current execution attempt info, we are able to retain active subtasks using 
the current execution attempt info.

 


> MetricStore does not remove metrics of nonexistent subtasks when adaptive 
> scheduler lowers job parallelism
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-29615
>                 URL: https://issues.apache.org/jira/browse/FLINK-29615
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Metrics, Runtime / REST
>    Affects Versions: 1.15.0, 1.16.0
>            Reporter: Zhanghao Chen
>            Priority: Major
>              Labels: pull-request-available
>
> We are exploring autoscaling Flink with Reactive mode using metrics from 
> Flink REST for guidance, and found that the metrics are not correctly updated.
>  
> *Problem*
> MetricStore does not remove metrics of nonexistent subtasks when adaptive 
> scheduler lowers job parallelism (aka, num of subtasks decreases) and users 
> will see metrics of nonexistent subtasks on Web UI (e.g. the task 
> backpressure page) or REST API response. It causes confusion and occupies 
> extra memory.
>  
> *Proposed Solution*
> Thanks to FLINK-29132 & FLINK-28588,  Flink will now update current execution 
> attempts when updating metrics. Since the active subtask info is included in 
> the current execution attempt info, we are able to retain active subtasks 
> using the current execution attempt info.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to