[
https://issues.apache.org/jira/browse/FLINK-33764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gyula Fora closed FLINK-33764.
------------------------------
Resolution: Fixed
merged to main f6adb400e1c87f06faec948379c264eebba71166
> Incorporate GC / Heap metrics in autoscaler decisions
> -----------------------------------------------------
>
> Key: FLINK-33764
> URL: https://issues.apache.org/jira/browse/FLINK-33764
> Project: Flink
> Issue Type: New Feature
> Components: Autoscaler, Kubernetes Operator
> Reporter: Gyula Fora
> Assignee: Gyula Fora
> Priority: Major
> Labels: pull-request-available
>
> The autoscaler currently doesn't use any GC/HEAP metrics as part of the
> scaling decisions.
> While the long term goal may be to support vertical scaling (increasing TM
> sizes) currently this is out of scope for the autoscaler.
> However it is very important to detect cases where the throughput of certain
> vertices or the entire pipeline is critically affected by long GC pauses. In
> these cases the current autoscaler logic would wrongly assume a low true
> processing rate and scale the pipeline too high, ramping up costs and causing
> further issues.
> Using the improved GC metrics introduced in
> https://issues.apache.org/jira/browse/FLINK-33318 we should measure the GC
> pauses and simply block scaling decisions if the pipeline spends too much
> time garbage collecting and notify the user about the required action to
> increase memory.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)