Sai Sharath Dandi created FLINK-36645:
-----------------------------------------
Summary: Gracefully handle null execution plan from autoscaler
Key: FLINK-36645
URL: https://issues.apache.org/jira/browse/FLINK-36645
Project: Flink
Issue Type: Improvement
Components: Autoscaler
Reporter: Sai Sharath Dandi
Ocassionally, we see error logs like below from the autoscaler module . It is
because the execution plan returned is null from the job manager rest API when
a scaling action is already in progress.
{code:java}
Error while scaling job
java.lang.NullPointerException: Cannot invoke
"org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.node.ArrayNode.iterator()"
because "nodes" is null at
o.a.f.a.topology.JobTopology.fromJsonPlan(JobTopology.java:155) at
o.a.f.a.ScalingMetricCollector.getJobTopology(ScalingMetricCollector.java:248)
at
o.a.f.a.ScalingMetricCollector.getJobTopology(ScalingMetricCollector.java:203)
at
o.a.f.a.ScalingMetricCollector.updateMetrics(ScalingMetricCollector.java:121)
at o.a.f.a.JobAutoScalerImpl.runScalingLogic(JobAutoScalerImpl.java:178) at
o.a.f.a.JobAutoScalerImpl.scale(JobAutoScalerImpl.java:103) at
c.u.a.c.p.s.YarnAutoScalerListener.stateHistoryMatch(YarnAutoScalerListener.java:72)
at c.u.a.w.c.c.WatchdogContextImpl.run(WatchdogContextImpl.java:165) at
j.u.c.Executors$RunnableAdapter.call(Executors.java:539) at
j.util.concurrent.FutureTask.run(FutureTask.java:264) at
j.u.c.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
at... {code}
We need to handle this case more gracefully by detecting the job status rather
than log NPE message.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)