Zheren Yu created FLINK-24240:
---------------------------------
Summary: HA JobGraph deserialization problem when migrate 1.12.4
to 1.13.2
Key: FLINK-24240
URL: https://issues.apache.org/jira/browse/FLINK-24240
Project: Flink
Issue Type: Bug
Components: Runtime / State Backends
Affects Versions: 1.13.2
Reporter: Zheren Yu
We are using HA with flink on k8s, which will create the configmap like
`xxx-dispatcher-leader`, and put jobGraph inside it, once we update version
from 1.12.4 to 1.13.2 without stopping the job, the jobGraph create from old
version will be deserialized and lacking of the filed of jobType, which cause
the below problem
```
Caused by: java.lang.NullPointerException
at
org.apache.flink.runtime.deployment.TaskDeploymentDescriptorFactory$PartitionLocationConstraint.fromJobType(TaskDeploymentDescriptorFactory.java:282)
~[flink-dist_2.12-1.13.2.jar:1.13.2]
at
org.apache.flink.runtime.scheduler.SchedulerBase.createAndRestoreExecutionGraph(SchedulerBase.java:347)
~[flink-dist_2.12-1.13.2.jar:1.13.2]
at
org.apache.flink.runtime.scheduler.SchedulerBase.<init>(SchedulerBase.java:190)
~[flink-dist_2.12-1.13.2.jar:1.13.2]
at
org.apache.flink.runtime.scheduler.DefaultScheduler.<init>(DefaultScheduler.java:122)
~[flink-dist_2.12-1.13.2.jar:1.13.2]
at
org.apache.flink.runtime.scheduler.DefaultSchedulerFactory.createInstance(DefaultSchedulerFactory.java:132)
~[flink-dist_2.12-1.13.2.jar:1.13.2]
at
org.apache.flink.runtime.jobmaster.DefaultSlotPoolServiceSchedulerFactory.createScheduler(DefaultSlotPoolServiceSchedulerFactory.java:110)
~[flink-dist_2.12-1.13.2.jar:1.13.2]
at
org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:340)
~[flink-dist_2.12-1.13.2.jar:1.13.2]
at
org.apache.flink.runtime.jobmaster.JobMaster.<init>(JobMaster.java:317)
~[flink-dist_2.12-1.13.2.jar:1.13.2]
at
org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.internalCreateJobMasterService(DefaultJobMasterServiceFactory.java:107)
~[flink-dist_2.12-1.13.2.jar:1.13.2]
at
org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.lambda$createJobMasterService$0(DefaultJobMasterServiceFactory.java:95)
~[flink-dist_2.12-1.13.2.jar:1.13.2]
at
org.apache.flink.util.function.FunctionUtils.lambda$uncheckedSupplier$4(FunctionUtils.java:112)
~[flink-dist_2.12-1.13.2.jar:1.13.2]
at
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
~[?:1.8.0_302]
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
~[?:1.8.0_302]
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
~[?:1.8.0_302]
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
~[?:1.8.0_302]
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
~[?:1.8.0_302]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
~[?:1.8.0_302]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
~[?:1.8.0_302]
at java.lang.Thread.run(Thread.java:748)
```
I just wandering do we have any workaround with this?
(although I know manually stopping the job may work)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)