[
https://issues.apache.org/jira/browse/FLINK-27314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528581#comment-17528581
]
Fuyao Li commented on FLINK-27314:
----------------------------------
Thanks for comment. [~gyfora] [~wangyang0918] I reevaluated this point
recently. I think it might be okay to add this into flink operator since this
is kind of a kubernetes deployment related feature.
Flink metrics are exposed through REST. There are metrics like average CPU load
across TMs, which could potentially be a good metric to trigger a rescale.
Since reconciler is checking the status of the Flink CR periodically, I assume
we can add those configurations in this periodically executed logic. Please
correct me if I am wrong.
In addition, just like many auto scaling group solution offered in different
cloud vendors, we can add some cool down time, and buffer time before
triggering the scaling. For example, wait for three consecutive reconcile
cycle, when all three cycles' result are meeting the threshold, scale up/scale
down.
> Support reactive mode for native Kubernetes integration in Flink Kubernetes
> Operator
> ------------------------------------------------------------------------------------
>
> Key: FLINK-27314
> URL: https://issues.apache.org/jira/browse/FLINK-27314
> Project: Flink
> Issue Type: New Feature
> Components: Kubernetes Operator
> Reporter: Fuyao Li
> Priority: Major
>
> Generally, this task is a low priority task now.
> Flink has some system level Flink metrics, Flink kubernetes operator can
> detect these metrics and rescale automatically based checkpoint(similar to
> standalone reactive mode) and rescale policy configured by users.
> The rescale behavior can be based on CPU utilization or memory utilization.
> # Before rescaling, Flink operator should check whether the cluster has
> enough resources, if not, the rescaling will be aborted.
> # We can create a addition field to support this feature. The fields below
> is just a rough suggestion.
> {code:java}
> reactiveScaling:
> enabled: boolean
> scaleMetric: enum ["CPU", "MEM"]
> scaleDownThreshold:
> scaleUpThreshold:
> minimumLimit:
> maximumLimit:
> increasePolicy: <increase/decrease exponentially or linearly..>
> <some other timeout configuration...>{code}
>
--
This message was sent by Atlassian Jira
(v8.20.7#820007)