Gallardot opened a new issue, #14698: URL: https://github.com/apache/dolphinscheduler/issues/14698
### Search before asking - [X] I had searched in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and found no similar issues. ### What happened We found some NullPointerExceptions on the master server. ``` [INFO] 2023-08-03 10:58:00.014 +0800 org.apache.dolphinscheduler.scheduler.quartz.ProcessScheduleTask:[68] - [WorkflowInstance-0][TaskInstance-0] - scheduled fire time :Thu Aug 03 10:58:00 CST 2023, fire time :Thu Aug 03 10:58:00 CST 2023, scheduleId :5 [INFO] 2023-08-03 10:58:00.406 +0800 org.apache.dolphinscheduler.server.master.runner.MasterSchedulerBootstrap:[178] - [WorkflowInstance-0][TaskInstance-0] - Master schedule bootstrap loop command success, fetch command size: 1, cost: 2ms, current slot: 0, total slot size: 1 [ERROR] 2023-08-03 10:58:00.423 +0800 org.apache.dolphinscheduler.server.master.runner.MasterSchedulerBootstrap:[147] - [WorkflowInstance-0][TaskInstance-0] - Master handle command 6601 error org.apache.dolphinscheduler.server.master.exception.WorkflowCreateException: Create workflow execute runnable failed at org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteRunnableFactory.createWorkflowExecuteRunnable(WorkflowExecuteRunnableFactory.java:95) at org.apache.dolphinscheduler.server.master.runner.MasterSchedulerBootstrap.lambda$run$0(MasterSchedulerBootstrap.java:136) at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) at java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:290) at java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731) at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:401) at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734) at java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:159) at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:173) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233) at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485) at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:650) at org.apache.dolphinscheduler.server.master.runner.MasterSchedulerBootstrap.run(MasterSchedulerBootstrap.java:133) Caused by: java.lang.NullPointerException: null at org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteContextFactory.createWorkflowInstance(WorkflowExecuteContextFactory.java:79) at org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteContextFactory.createWorkflowExecuteRunnableContext(WorkflowExecuteContextFactory.java:54) at org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteRunnableFactory.createWorkflowExecuteRunnable(WorkflowExecuteRunnableFactory.java:81) ... 15 common frames omitted [INFO] 2023-08-03 10:58:31.177 +0800 org.apache.dolphinscheduler.server.master.runner.StateWheelExecuteThread:[315] - [WorkflowInstance-6581][TaskInstance-0] - [TaskInstanceKey-10444805098592:3]The task instance can retry, will retry this task instance [INFO] 2023-08-03 10:58:31.177 +0800 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThreadPool:[101] - [WorkflowInstance-6581][TaskInstance-0] - Submit state event success, stateEvent: TaskStateEvent(processInstanceId=6581, taskInstanceId=null, taskCode=10444805098592, status=TaskExecutionStatus{code=1, desc='running'}, type=TASK_RETRY, key=null, channel=null, context=null) [INFO] 2023-08-03 10:58:31.259 +0800 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteRunnable:[290] - [WorkflowInstance-6581][TaskInstance-null] - Begin to handle state event, TaskStateEvent(processInstanceId=6581, taskInstanceId=null, taskCode=10444805098592, status=TaskExecutionStatus{code=1, desc='running'}, type=TASK_RETRY, key=null, channel=null, context=null) ``` We have currently identified that the NPE is caused by the fact that this piece of code returns null. https://github.com/apache/dolphinscheduler/blob/5ec9085113c989c06a23d0e6820627b147dad15e/dolphinscheduler-service/src/main/java/org/apache/dolphinscheduler/service/process/ProcessServiceImpl.java#L331-L335 @ruanwenjun PTAL. ref: https://github.com/apache/dolphinscheduler/pull/14544 ### What you expected to happen no NPE error ### How to reproduce A new cluster has been deployed through k8s with the dev branch, and MySQL is used as the database. The following steps were taken: 1. Created a shell task with the script content 'exit 10086;' and set the task to retry once. 2. Created a corresponding workflow with a serial wait execution policy. 3. Created a scheduled task to execute every minute. After waiting for the task to execute several times, a NullPointerException error log appeared. ### Anything else _No response_ ### Version dev ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
