[ 
https://issues.apache.org/jira/browse/FLINK-36645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-36645:
-----------------------------------
    Labels: pull-request-available  (was: )

> Gracefully handle null execution plan from autoscaler
> -----------------------------------------------------
>
>                 Key: FLINK-36645
>                 URL: https://issues.apache.org/jira/browse/FLINK-36645
>             Project: Flink
>          Issue Type: Improvement
>          Components: Autoscaler, Kubernetes Operator
>            Reporter: Sai Sharath Dandi
>            Assignee: Sai Sharath Dandi
>            Priority: Not a Priority
>              Labels: pull-request-available
>
> Ocassionally, we see error logs like below from the autoscaler module . It is 
> because the execution plan returned is null from the job manager rest API 
> when a scaling action is already in progress.
>  
> {code:java}
> Error while scaling job
> java.lang.NullPointerException: Cannot invoke 
> "org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.node.ArrayNode.iterator()"
>  because "nodes" is null at 
> o.a.f.a.topology.JobTopology.fromJsonPlan(JobTopology.java:155) at 
> o.a.f.a.ScalingMetricCollector.getJobTopology(ScalingMetricCollector.java:248)
>  at 
> o.a.f.a.ScalingMetricCollector.getJobTopology(ScalingMetricCollector.java:203)
>  at 
> o.a.f.a.ScalingMetricCollector.updateMetrics(ScalingMetricCollector.java:121) 
> at o.a.f.a.JobAutoScalerImpl.runScalingLogic(JobAutoScalerImpl.java:178) at 
> o.a.f.a.JobAutoScalerImpl.scale(JobAutoScalerImpl.java:103) at 
> c.u.a.c.p.s.YarnAutoScalerListener.stateHistoryMatch(YarnAutoScalerListener.java:72)
>  at c.u.a.w.c.c.WatchdogContextImpl.run(WatchdogContextImpl.java:165) at 
> j.u.c.Executors$RunnableAdapter.call(Executors.java:539) at 
> j.util.concurrent.FutureTask.run(FutureTask.java:264) at 
> j.u.c.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
>  at... {code}
> We need to handle this case more gracefully by detecting the job status 
> rather than log NPE message.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to