Hi all,
Sindhu in my team has hit an issue. She has requested access to the dev list 
but has not got that access yet.

We are running the latest Operator 1.12 and Flink 1.20.1. We are including the 
autoscaler jar.

We are trying to set up autoscaler and are getting the following error:

java.lang.RuntimeException: Missing required TM metrics at 
org.apache.flink.autoscaler.RestApiMetricsCollector.queryTmMetrics(RestApiMetricsCollector.java:179)
 at 
org.apache.flink.autoscaler.ScalingMetricCollector.updateMetrics(ScalingMetricCollector.java:137)
 at 
org.apache.flink.autoscaler.JobAutoScalerImpl.runScalingLogic(JobAutoScalerImpl.java:183)
 at 
org.apache.flink.autoscaler.JobAutoScalerImpl.scale(JobAutoScalerImpl.java:103) 
at 
org.apache.flink.kubernetes.operator.reconciler.deployment.AbstractFlinkResourceReconciler.applyAutoscaler(AbstractFlinkResourceReconciler.java:219)
 at 
org.apache.flink.kubernetes.operator.reconciler.deployment.AbstractFlinkResourceReconciler.reconcile(AbstractFlinkResourceReconciler.java:142)
 at 
org.apache.flink.kubernetes.operator.controller.FlinkDeploymentController.reconcile(FlinkDeploymentController.java:155)
 at 
org.apache.flink.kubernetes.operator.controller.FlinkDeploymentController.reconcile(FlinkDeploymentController.java:62)
 at 
io.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:153)
 at 
io.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:111)
 at 
org.apache.flink.kubernetes.operator.metrics.OperatorJosdkMetrics.timeControllerExecution(OperatorJosdkMetrics.java:80)
 at 
io.javaoperatorsdk.operator.processing.Controller.reconcile(Controller.java:110)
 at 
io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.reconcileExecution(ReconciliationDispatcher.java:136)
 at 
io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleReconcile(ReconciliationDispatcher.java:117)
 at 
io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleDispatch(ReconciliationDispatcher.java:91)
 at 
io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleExecution(ReconciliationDispatcher.java:64)
 at 
io.javaoperatorsdk.operator.processing.event.EventProcessor$ReconcilerExecutor.run(EventProcessor.java:452)
 at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) 
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) 
at java.base/java.lang.Thread.run(Unknown Source)

The flink config is :
job.autoscaler.metrics.window: 3m taskmanager.memory.jvm-metaspace.size: 256 mb 
metrics.system-resource: 'true' pipeline.max-parallelism: '24' 
taskmanager.network.detailed-metrics: 'true' 
job.autoscaler.target.utilization.boundary: '0.1' 
job.autoscaler.catch-up.duration: 5m job.autoscaler.restart.time: 2m 
job.autoscaler.scaling.enabled: 'true' job.autoscaler.stabilization.interval: 
1m job.autoscaler.enabled: 'true' jobmanager.scheduler: adaptive


it looks like the code issuing this is 
https://github.com/apache/flink-kubernetes-operator/blob/4ef79591f641e3c76c41b2791[…]n/java/org/apache/flink/autoscaler/RestApiMetricsCollector.java<https://github.com/apache/flink-kubernetes-operator/blob/4ef79591f641e3c76c41b279103aab358bee3327/flink-autoscaler/src/main/java/org/apache/flink/autoscaler/RestApiMetricsCollector.java#L179>
10:31<https://ibm-cloud.slack.com/archives/C06C1K9SFU7/p1748511090468179>
it looks like it is looking for configuration keys based on 
https://github.com/apache/flink-kubernetes-operator/blob/4ef79591f641e3c76c41b2791[…]n/java/org/apache/flink/autoscaler/RestApiMetricsCollector.java<https://github.com/apache/flink-kubernetes-operator/blob/4ef79591f641e3c76c41b279103aab358bee3327/flink-autoscaler/src/main/java/org/apache/flink/autoscaler/RestApiMetricsCollector.java#L60>
 (edited)

It is looking for but not finding the java metrics:
"Status.JVM.Memory.Heap.Max"
"Status.JVM.Memory.Heap.Used",
"Status.Flink.Memory.Managed.Used",
"Status.JVM.Memory.Metaspace.Used",

We have tried adding a metric reporter as below.

defaultConfiguration:
  create: true
  # Set append to false to replace configuration files
  append: true
  flink-conf.yaml: |+
    # Flink Config Overrides
    kubernetes.operator.metrics.reporter.slf4j.factory.class: 
org.apache.flink.metrics.slf4j.Slf4jReporterFactory
    kubernetes.operator.metrics.reporter.slf4j.interval: 5 MINUTE

    kubernetes.operator.reconcile.interval: 15 s
    kubernetes.operator.observer.progress-check.interval: 5 s

Any ideas what we might be missing?

Kind regards, David.



Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: Building C, IBM Hursley Office, Hursley Park Road, 
Winchester, Hampshire SO21 2JN

Reply via email to