[ 
https://issues.apache.org/jira/browse/FLINK-39826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gyula Fora closed FLINK-39826.
------------------------------
      Assignee: Dennis-Mircea Ciupitu
    Resolution: Fixed

merged to main 34edcc45851d523e97edb6f5ccc0ceb945a32dd7

> Strengthen autoscaler configuration validation
> ----------------------------------------------
>
>                 Key: FLINK-39826
>                 URL: https://issues.apache.org/jira/browse/FLINK-39826
>             Project: Flink
>          Issue Type: Improvement
>          Components: Kubernetes Operator
>    Affects Versions: kubernetes-operator-1.15.0
>            Reporter: Dennis-Mircea Ciupitu
>            Assignee: Dennis-Mircea Ciupitu
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: kubernetes-operator-1.16.0
>
>
> h1. Summary
> Several autoscaler configuration options are not validated, so invalid values 
> are accepted silently and surface only as confusing runtime behavior or, in 
> one case, as autoscaling that never runs. This issue tightens autoscaler 
> configuration validation to reject these misconfigurations when the resource 
> is submitted, instead of letting them degrade scaling silently.
> h1. Background and Gaps
> h2. Unbounded numeric options
> The autoscaler validator currently bounds only a subset of numeric options 
> (utilization target, min and max, scale factors). Several other ratio-style 
> options are left unchecked:
> - {{job.autoscaler.memory.gc-pressure.threshold}}
> - {{job.autoscaler.memory.heap-usage.threshold}}
> - {{job.autoscaler.scaling.effectiveness.threshold}}
> - {{job.autoscaler.memory.tuning.overhead}}
> These are all fractions that are only meaningful within the [0, 1] range, yet 
> out-of-range values are accepted today. For example, a scaling effectiveness 
> threshold above 1 silently blocks all scale ups, and a negative memory tuning 
> overhead can drive the tuned memory below the observed usage.
> In addition, the observed scalability coefficient minimum is validated 
> unconditionally, even though it only takes effect when observed scalability 
> is enabled. Options that only matter behind a feature flag should only be 
> validated when that feature is on, otherwise a harmless value can be rejected.
> h2. Metric window smaller than the reconcile interval
> The autoscaler collects one metric sample per reconcile loop and requires at 
> least two samples within the metric window before it evaluates scaling. If 
> the metric window is configured smaller than the operator reconcile interval, 
> the window is trimmed down to a single sample on every loop, the two-sample 
> requirement is never met, and autoscaling is never applied. Nothing validates 
> this relationship today, so the autoscaler appears enabled while silently 
> doing nothing.
> h1. Goal
> Validate the above at resource submission time so misconfigurations are 
> reported as clear errors instead of silently degrading or disabling 
> autoscaling. Feature-gated options are validated only when their feature is 
> enabled, to avoid rejecting values that have no effect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to