[jira] [Commented] (FLINK-27314) Support reactive mode for native Kubernetes integration in Flink Kubernetes Operator

Fuyao Li (Jira) Tue, 26 Apr 2022 23:43:05 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-27314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528581#comment-17528581
 ]


Fuyao Li commented on FLINK-27314:
----------------------------------

Thanks for comment. [~gyfora] [~wangyang0918] I reevaluated this point 
recently. I think it might be okay to add this into flink operator since this 
is kind of a kubernetes deployment related feature.

Flink metrics are exposed through REST. There are metrics like average CPU load 
across TMs, which could potentially be a good metric to trigger a rescale. 
Since reconciler is checking the status of the Flink CR periodically, I assume 
we can add those configurations in this periodically executed logic. Please 
correct me if I am wrong.

In addition, just like many auto scaling group solution offered in different 
cloud vendors, we can add some cool down time, and buffer time before 
triggering the scaling. For example, wait for three consecutive reconcile 
cycle, when all three cycles' result are meeting the threshold, scale up/scale 
down.

> Support reactive mode for native Kubernetes integration in Flink Kubernetes 
> Operator
> ------------------------------------------------------------------------------------
>
>                 Key: FLINK-27314
>                 URL: https://issues.apache.org/jira/browse/FLINK-27314
>             Project: Flink
>          Issue Type: New Feature
>          Components: Kubernetes Operator
>            Reporter: Fuyao Li
>            Priority: Major
>
> Generally, this task is a low priority task now.
> Flink has some system level Flink metrics, Flink kubernetes operator can 
> detect these metrics and rescale automatically based checkpoint(similar to 
> standalone reactive mode) and rescale policy configured by users.
> The rescale behavior can be based on CPU utilization or memory utilization.
>  # Before rescaling, Flink operator should check whether the cluster has 
> enough resources, if not, the rescaling will be aborted.
>  # We can create a addition field to support this feature. The fields below 
> is just a rough suggestion.
> {code:java}
> reactiveScaling:
>   enabled: boolean
>   scaleMetric:  enum ["CPU", "MEM"]
>     scaleDownThreshold:
>     scaleUpThreshold:
>     minimumLimit:
>     maximumLimit:
>     increasePolicy: <increase/decrease exponentially or linearly..>
> <some other timeout configuration...>{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (FLINK-27314) Support reactive mode for native Kubernetes integration in Flink Kubernetes Operator

Reply via email to