Gyula Fora created FLINK-33764:
----------------------------------
Summary: Incorporate GC / Heap metrics in autoscaler decisions
Key: FLINK-33764
URL: https://issues.apache.org/jira/browse/FLINK-33764
Project: Flink
Issue Type: New Feature
Components: Autoscaler, Kubernetes Operator
Reporter: Gyula Fora
Assignee: Gyula Fora
The autoscaler currently doesn't use any GC/HEAP metrics as part of the scaling
decisions.
While the long term goal may be to support vertical scaling (increasing TM
sizes) currently this is out of scope for the autoscaler.
However it is very important to detect cases where the throughput of certain
vertices or the entire pipeline is critically affected by long GC pauses. In
these cases the current autoscaler logic would wrongly assume a low true
processing rate and scale the pipeline too high, ramping up costs and causing
further issues.
Using the improved GC metrics introduced in
https://issues.apache.org/jira/browse/FLINK-33318 we should measure the GC
pauses and simply block scaling decisions if the pipeline spends too much time
garbage collecting and notify the user about the required action to increase
memory.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)