[
https://issues.apache.org/jira/browse/FLINK-24240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zheren Yu closed FLINK-24240.
-----------------------------
Resolution: Not A Bug
> HA JobGraph deserialization problem when migrate 1.12.4 to 1.13.2
> -----------------------------------------------------------------
>
> Key: FLINK-24240
> URL: https://issues.apache.org/jira/browse/FLINK-24240
> Project: Flink
> Issue Type: Bug
> Components: Runtime / State Backends
> Affects Versions: 1.13.2
> Reporter: Zheren Yu
> Priority: Major
>
> We are using HA with flink on k8s, which will create the configmap like
> `xxx-dispatcher-leader`, and put jobGraph inside it, once we update version
> from 1.12.4 to 1.13.2 without stopping the job, the jobGraph create from old
> version will be deserialized and lacking of the filed of jobType, which cause
> the below problem
> {code:java}
> Caused by: java.lang.NullPointerException
> at
> org.apache.flink.runtime.deployment.TaskDeploymentDescriptorFactory$PartitionLocationConstraint.fromJobType(TaskDeploymentDescriptorFactory.java:282)
> ~[flink-dist_2.12-1.13.2.jar:1.13.2]
> at
> org.apache.flink.runtime.scheduler.SchedulerBase.createAndRestoreExecutionGraph(SchedulerBase.java:347)
> ~[flink-dist_2.12-1.13.2.jar:1.13.2]
> at
> org.apache.flink.runtime.scheduler.SchedulerBase.<init>(SchedulerBase.java:190)
> ~[flink-dist_2.12-1.13.2.jar:1.13.2]
> at
> org.apache.flink.runtime.scheduler.DefaultScheduler.<init>(DefaultScheduler.java:122)
> ~[flink-dist_2.12-1.13.2.jar:1.13.2]
> at
> org.apache.flink.runtime.scheduler.DefaultSchedulerFactory.createInstance(DefaultSchedulerFactory.java:132)
> ~[flink-dist_2.12-1.13.2.jar:1.13.2]
> at
> org.apache.flink.runtime.jobmaster.DefaultSlotPoolServiceSchedulerFactory.createScheduler(DefaultSlotPoolServiceSchedulerFactory.java:110)
> ~[flink-dist_2.12-1.13.2.jar:1.13.2]
> at
> org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:340)
> ~[flink-dist_2.12-1.13.2.jar:1.13.2]
> at
> org.apache.flink.runtime.jobmaster.JobMaster.<init>(JobMaster.java:317)
> ~[flink-dist_2.12-1.13.2.jar:1.13.2]
> at
> org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.internalCreateJobMasterService(DefaultJobMasterServiceFactory.java:107)
> ~[flink-dist_2.12-1.13.2.jar:1.13.2]
> at
> org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.lambda$createJobMasterService$0(DefaultJobMasterServiceFactory.java:95)
> ~[flink-dist_2.12-1.13.2.jar:1.13.2]
> at
> org.apache.flink.util.function.FunctionUtils.lambda$uncheckedSupplier$4(FunctionUtils.java:112)
> ~[flink-dist_2.12-1.13.2.jar:1.13.2]
> at
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
> ~[?:1.8.0_302]
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> ~[?:1.8.0_302]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ~[?:1.8.0_302]
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> ~[?:1.8.0_302]
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> ~[?:1.8.0_302]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> ~[?:1.8.0_302]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> ~[?:1.8.0_302]
> at java.lang.Thread.run(Thread.java:748)
> {code}
> I just wandering do we have any workaround with this?
> (although I know manually stopping the job may work)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)