Hi all, Sindhu in my team has hit an issue. She has requested access to the dev list but has not got that access yet.
We are running the latest Operator 1.12 and Flink 1.20.1. We are including the autoscaler jar. We are trying to set up autoscaler and are getting the following error: java.lang.RuntimeException: Missing required TM metrics at org.apache.flink.autoscaler.RestApiMetricsCollector.queryTmMetrics(RestApiMetricsCollector.java:179) at org.apache.flink.autoscaler.ScalingMetricCollector.updateMetrics(ScalingMetricCollector.java:137) at org.apache.flink.autoscaler.JobAutoScalerImpl.runScalingLogic(JobAutoScalerImpl.java:183) at org.apache.flink.autoscaler.JobAutoScalerImpl.scale(JobAutoScalerImpl.java:103) at org.apache.flink.kubernetes.operator.reconciler.deployment.AbstractFlinkResourceReconciler.applyAutoscaler(AbstractFlinkResourceReconciler.java:219) at org.apache.flink.kubernetes.operator.reconciler.deployment.AbstractFlinkResourceReconciler.reconcile(AbstractFlinkResourceReconciler.java:142) at org.apache.flink.kubernetes.operator.controller.FlinkDeploymentController.reconcile(FlinkDeploymentController.java:155) at org.apache.flink.kubernetes.operator.controller.FlinkDeploymentController.reconcile(FlinkDeploymentController.java:62) at io.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:153) at io.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:111) at org.apache.flink.kubernetes.operator.metrics.OperatorJosdkMetrics.timeControllerExecution(OperatorJosdkMetrics.java:80) at io.javaoperatorsdk.operator.processing.Controller.reconcile(Controller.java:110) at io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.reconcileExecution(ReconciliationDispatcher.java:136) at io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleReconcile(ReconciliationDispatcher.java:117) at io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleDispatch(ReconciliationDispatcher.java:91) at io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleExecution(ReconciliationDispatcher.java:64) at io.javaoperatorsdk.operator.processing.event.EventProcessor$ReconcilerExecutor.run(EventProcessor.java:452) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.base/java.lang.Thread.run(Unknown Source) The flink config is : job.autoscaler.metrics.window: 3m taskmanager.memory.jvm-metaspace.size: 256 mb metrics.system-resource: 'true' pipeline.max-parallelism: '24' taskmanager.network.detailed-metrics: 'true' job.autoscaler.target.utilization.boundary: '0.1' job.autoscaler.catch-up.duration: 5m job.autoscaler.restart.time: 2m job.autoscaler.scaling.enabled: 'true' job.autoscaler.stabilization.interval: 1m job.autoscaler.enabled: 'true' jobmanager.scheduler: adaptive it looks like the code issuing this is https://github.com/apache/flink-kubernetes-operator/blob/4ef79591f641e3c76c41b2791[…]n/java/org/apache/flink/autoscaler/RestApiMetricsCollector.java<https://github.com/apache/flink-kubernetes-operator/blob/4ef79591f641e3c76c41b279103aab358bee3327/flink-autoscaler/src/main/java/org/apache/flink/autoscaler/RestApiMetricsCollector.java#L179> 10:31<https://ibm-cloud.slack.com/archives/C06C1K9SFU7/p1748511090468179> it looks like it is looking for configuration keys based on https://github.com/apache/flink-kubernetes-operator/blob/4ef79591f641e3c76c41b2791[…]n/java/org/apache/flink/autoscaler/RestApiMetricsCollector.java<https://github.com/apache/flink-kubernetes-operator/blob/4ef79591f641e3c76c41b279103aab358bee3327/flink-autoscaler/src/main/java/org/apache/flink/autoscaler/RestApiMetricsCollector.java#L60> (edited) It is looking for but not finding the java metrics: "Status.JVM.Memory.Heap.Max" "Status.JVM.Memory.Heap.Used", "Status.Flink.Memory.Managed.Used", "Status.JVM.Memory.Metaspace.Used", We have tried adding a metric reporter as below. defaultConfiguration: create: true # Set append to false to replace configuration files append: true flink-conf.yaml: |+ # Flink Config Overrides kubernetes.operator.metrics.reporter.slf4j.factory.class: org.apache.flink.metrics.slf4j.Slf4jReporterFactory kubernetes.operator.metrics.reporter.slf4j.interval: 5 MINUTE kubernetes.operator.reconcile.interval: 15 s kubernetes.operator.observer.progress-check.interval: 5 s Any ideas what we might be missing? Kind regards, David. Unless otherwise stated above: IBM United Kingdom Limited Registered in England and Wales with number 741598 Registered office: Building C, IBM Hursley Office, Hursley Park Road, Winchester, Hampshire SO21 2JN