[
https://issues.apache.org/jira/browse/FLINK-33771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17898124#comment-17898124
]
Ryan van Huuksloot commented on FLINK-33771:
--------------------------------------------
I'd be curious to know more about the ResourceQuota work that [~gsomogyi] (or
[~gaborgsomogyi] - 2 accounts) has been doing. We have issues with
observability internally with ResourceQuotas limited TaskManager creation
without exposing any effective messaging to the Flink Deployment. The logs are
only in the operator. It would be better for users to have some status field
that states if a ResourceQuota / anything is limiting the growth of a job.
> Add cluster capacity awareness to Autoscaler
> --------------------------------------------
>
> Key: FLINK-33771
> URL: https://issues.apache.org/jira/browse/FLINK-33771
> Project: Flink
> Issue Type: New Feature
> Components: Autoscaler, Kubernetes Operator
> Reporter: Maximilian Michels
> Assignee: Maximilian Michels
> Priority: Major
> Labels: pull-request-available
> Fix For: kubernetes-operator-1.8.0
>
>
> To avoid starvation of pipelines when the Kubernetes cluster runs out of
> resources, new scaling attempts should be stopped.
> The Rescaling API will probably prevent most of this cases but we will also
> have to double-check there.
> For the config-based parallelism overrides, we have pretty good heuristics in
> the operator to check in Kubernetes for the approximate number of free
> cluster resources, the max cluster scaleup for the Cluster Autoscaler, and
> the required scaling costs.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)