Sriram Ganesh created FLINK-31924:
-------------------------------------
Summary: [Flink operator] Flink Autoscale - Limit the max number
of scale ups
Key: FLINK-31924
URL: https://issues.apache.org/jira/browse/FLINK-31924
Project: Flink
Issue Type: Improvement
Affects Versions: kubernetes-operator-1.4.0
Reporter: Sriram Ganesh
Found that Autoscale keeps happening even after reaching max-parallelism.
{color:#172b4d}Flink version: 1.17
{color}Source: Kafka
Configuration:
{code:java}
flinkConfiguration:
kubernetes.operator.job.autoscaler.enabled: "true"
kubernetes.operator.job.autoscaler.scaling.sources.enabled: "true"
kubernetes.operator.job.autoscaler.target.utilization: "0.6"
kubernetes.operator.job.autoscaler.target.utilization.boundary: "0.2"
kubernetes.operator.job.autoscaler.stabilization.interval: "1m"
kubernetes.operator.job.autoscaler.metrics.window: "3m"{code}
Logs:
{code:java}
2023-04-24 12:29:10,738 o.a.f.k.o.c.FlinkDeploymentController [INFO
][my-namespace/my-pod] Starting reconciliation2023-04-24 12:29:10,740
o.a.f.k.o.s.FlinkResourceContextFactory [INFO ][my-namespace/my-pod] Getting
service for my-job2023-04-24 12:29:10,740 o.a.f.k.o.o.JobStatusObserver [INFO
][my-namespace/my-pod] Observing job status2023-04-24 12:29:10,765
o.a.f.k.o.o.JobStatusObserver [INFO ][my-namespace/my-pod] Job status changed
from CREATED to RUNNING2023-04-24 12:29:10,870 o.a.f.k.o.l.AuditUtils
[INFO ][my-namespace/my-pod] >>> Event | Info | JOBSTATUSCHANGED | Job
status changed from CREATED to RUNNING2023-04-24 12:29:10,938
o.a.f.k.o.l.AuditUtils [INFO ][my-namespace/my-pod] >>> Status | Info
| STABLE | The resource deployment is considered to be stable and
won’t be rolled back2023-04-24 12:29:10,986 o.a.f.k.o.a.ScalingMetricCollector
[INFO ][my-namespace/my-pod] Skipping metric collection during stabilization
period until 2023-04-24T12:30:10.765Z2023-04-24 12:29:10,986
o.a.f.k.o.r.d.AbstractFlinkResourceReconciler [INFO ][my-namespace/my-pod]
Resource fully reconciled, nothing to do...2023-04-24 12:29:10,986
o.a.f.k.o.c.FlinkDeploymentController [INFO ][my-namespace/my-pod] End of
reconciliation2023-04-24 12:29:25,991 o.a.f.k.o.c.FlinkDeploymentController
[INFO ][my-namespace/my-pod] Starting reconciliation2023-04-24 12:29:25,992
o.a.f.k.o.s.FlinkResourceContextFactory [INFO ][my-namespace/my-pod] Getting
service for my-job2023-04-24 12:29:25,992 o.a.f.k.o.o.JobStatusObserver [INFO
][my-namespace/my-pod] Observing job status2023-04-24 12:29:26,005
o.a.f.k.o.o.JobStatusObserver [INFO ][my-namespace/my-pod] Job status
(RUNNING) unchanged2023-04-24 12:29:26,053 o.a.f.k.o.a.ScalingMetricCollector
[INFO ][my-namespace/my-pod] Skipping metric collection during stabilization
period until 2023-04-24T12:30:10.765Z2023-04-24 12:29:26,054
o.a.f.k.o.r.d.AbstractFlinkResourceReconciler [INFO ][my-namespace/my-pod]
Resource fully reconciled, nothing to do...2023-04-24 12:29:26,054
o.a.f.k.o.c.FlinkDeploymentController [INFO ][my-namespace/my-pod] End of
reconciliation2023-04-24 12:29:41,059 o.a.f.k.o.c.FlinkDeploymentController
[INFO ][my-namespace/my-pod] Starting reconciliation2023-04-24 12:29:41,060
o.a.f.k.o.s.FlinkResourceContextFactory [INFO ][my-namespace/my-pod] Getting
service for my-job2023-04-24 12:29:41,061 o.a.f.k.o.o.JobStatusObserver [INFO
][my-namespace/my-pod] Observing job status2023-04-24 12:29:41,075
o.a.f.k.o.o.JobStatusObserver [INFO ][my-namespace/my-pod] Job status
(RUNNING) unchanged2023-04-24 12:29:41,116 o.a.f.k.o.a.ScalingMetricCollector
[INFO ][my-namespace/my-pod] Skipping metric collection during stabilization
period until 2023-04-24T12:30:10.765Z2023-04-24 12:29:41,116
o.a.f.k.o.r.d.AbstractFlinkResourceReconciler [INFO ][my-namespace/my-pod]
Resource fully reconciled, nothing to do...2023-04-24 12:29:41,116
o.a.f.k.o.c.FlinkDeploymentController [INFO ][my-namespace/my-pod] End of
reconciliation2023-04-24 12:29:56,121 o.a.f.k.o.c.FlinkDeploymentController
[INFO ][my-namespace/my-pod] Starting reconciliation2023-04-24 12:29:56,122
o.a.f.k.o.s.FlinkResourceContextFactory [INFO ][my-namespace/my-pod] Getting
service for my-job2023-04-24 12:29:56,122 o.a.f.k.o.o.JobStatusObserver [INFO
][my-namespace/my-pod] Observing job status2023-04-24 12:29:56,134
o.a.f.k.o.o.JobStatusObserver [INFO ][my-namespace/my-pod] Job status
(RUNNING) unchanged2023-04-24 12:29:56,178 o.a.f.k.o.a.ScalingMetricCollector
[INFO ][my-namespace/my-pod] Skipping metric collection during stabilization
period until 2023-04-24T12:30:10.765Z2023-04-24 12:29:56,179
o.a.f.k.o.r.d.AbstractFlinkResourceReconciler [INFO ][my-namespace/my-pod]
Resource fully reconciled, nothing to do...2023-04-24 12:29:56,179
o.a.f.k.o.c.FlinkDeploymentController [INFO ][my-namespace/my-pod] End of
reconciliation2023-04-24 12:30:11,183 o.a.f.k.o.c.FlinkDeploymentController
[INFO ][my-namespace/my-pod] Starting reconciliation2023-04-24 12:30:11,184
o.a.f.k.o.s.FlinkResourceContextFactory [INFO ][my-namespace/my-pod] Getting
service for my-job2023-04-24 12:30:11,184 o.a.f.k.o.o.JobStatusObserver [INFO
][my-namespace/my-pod] Observing job status2023-04-24 12:30:11,193
o.a.f.k.o.o.JobStatusObserver [INFO ][my-namespace/my-pod] Job status
(RUNNING) unchanged2023-04-24 12:30:11,367 o.a.f.k.o.a.m.ScalingMetrics
[ERROR][my-namespace/my-pod] Cannot compute source target data rate without
numRecordsInPerSecond and pendingRecords (lag) metric for
e5a72f353fc1e6bbf3bd96a41384998c.2023-04-24 12:30:11,370
o.a.f.k.o.a.ScalingMetricCollector [INFO ][my-namespace/my-pod] Waiting until
2023-04-24T12:33:10.765Z so the initial metric window is full before starting
scaling2023-04-24 12:30:11,370 o.a.f.k.o.r.d.AbstractFlinkResourceReconciler
[INFO ][my-namespace/my-pod] Resource fully reconciled, nothing to
do...2023-04-24 12:30:11,370 o.a.f.k.o.c.FlinkDeploymentController [INFO
][my-namespace/my-pod] End of reconciliation2023-04-24 12:30:26,374
o.a.f.k.o.c.FlinkDeploymentController [INFO ][my-namespace/my-pod] Starting
reconciliation2023-04-24 12:30:26,375 o.a.f.k.o.s.FlinkResourceContextFactory
[INFO ][my-namespace/my-pod] Getting service for my-job2023-04-24 12:30:26,376
o.a.f.k.o.o.JobStatusObserver [INFO ][my-namespace/my-pod] Observing job
status2023-04-24 12:30:26,385 o.a.f.k.o.o.JobStatusObserver [INFO
][my-namespace/my-pod] Job status (RUNNING) unchanged2023-04-24 12:30:26,542
o.a.f.k.o.a.m.ScalingMetrics [ERROR][my-namespace/my-pod] Cannot compute
source target data rate without numRecordsInPerSecond and pendingRecords (lag)
metric for e5a72f353fc1e6bbf3bd96a41384998c.2023-04-24 12:30:26,543
o.a.f.k.o.a.ScalingMetricCollector [INFO ][my-namespace/my-pod] Waiting until
2023-04-24T12:33:10.765Z so the initial metric window is full before starting
scaling2023-04-24 12:30:26,543 o.a.f.k.o.r.d.AbstractFlinkResourceReconciler
[INFO ][my-namespace/my-pod] Resource fully reconciled, nothing to
do...2023-04-24 12:30:26,544 o.a.f.k.o.c.FlinkDeploymentController [INFO
][my-namespace/my-pod] End of reconciliation{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)