Gallardot opened a new issue, #14698:
URL: https://github.com/apache/dolphinscheduler/issues/14698

   ### Search before asking
   
   - [X] I had searched in the 
[issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and 
found no similar issues.
   
   
   ### What happened
   
   We found some NullPointerExceptions on the master server.
   ```
   [INFO] 2023-08-03 10:58:00.014 +0800 
org.apache.dolphinscheduler.scheduler.quartz.ProcessScheduleTask:[68] - 
[WorkflowInstance-0][TaskInstance-0] - scheduled fire time :Thu Aug 03 10:58:00 
CST 2023, fire time :Thu Aug 03 10:58:00 CST 2023, scheduleId :5
   [INFO] 2023-08-03 10:58:00.406 +0800 
org.apache.dolphinscheduler.server.master.runner.MasterSchedulerBootstrap:[178] 
- [WorkflowInstance-0][TaskInstance-0] - Master schedule bootstrap loop command 
success, fetch command size: 1, cost: 2ms, current slot: 0, total slot size: 1
   [ERROR] 2023-08-03 10:58:00.423 +0800 
org.apache.dolphinscheduler.server.master.runner.MasterSchedulerBootstrap:[147] 
- [WorkflowInstance-0][TaskInstance-0] - Master handle command 6601 error 
   org.apache.dolphinscheduler.server.master.exception.WorkflowCreateException: 
Create workflow execute runnable failed
        at 
org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteRunnableFactory.createWorkflowExecuteRunnable(WorkflowExecuteRunnableFactory.java:95)
        at 
org.apache.dolphinscheduler.server.master.runner.MasterSchedulerBootstrap.lambda$run$0(MasterSchedulerBootstrap.java:136)
        at 
java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
        at 
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384)
        at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
        at java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:290)
        at java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731)
        at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
        at java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:401)
        at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734)
        at 
java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:159)
        at 
java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:173)
        at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
        at 
java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
        at 
java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:650)
        at 
org.apache.dolphinscheduler.server.master.runner.MasterSchedulerBootstrap.run(MasterSchedulerBootstrap.java:133)
   Caused by: java.lang.NullPointerException: null
        at 
org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteContextFactory.createWorkflowInstance(WorkflowExecuteContextFactory.java:79)
        at 
org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteContextFactory.createWorkflowExecuteRunnableContext(WorkflowExecuteContextFactory.java:54)
        at 
org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteRunnableFactory.createWorkflowExecuteRunnable(WorkflowExecuteRunnableFactory.java:81)
        ... 15 common frames omitted
   [INFO] 2023-08-03 10:58:31.177 +0800 
org.apache.dolphinscheduler.server.master.runner.StateWheelExecuteThread:[315] 
- [WorkflowInstance-6581][TaskInstance-0] - 
[TaskInstanceKey-10444805098592:3]The task instance can retry, will retry this 
task instance
   [INFO] 2023-08-03 10:58:31.177 +0800 
org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThreadPool:[101]
 - [WorkflowInstance-6581][TaskInstance-0] - Submit state event success, 
stateEvent: TaskStateEvent(processInstanceId=6581, taskInstanceId=null, 
taskCode=10444805098592, status=TaskExecutionStatus{code=1, desc='running'}, 
type=TASK_RETRY, key=null, channel=null, context=null)
   [INFO] 2023-08-03 10:58:31.259 +0800 
org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteRunnable:[290] 
- [WorkflowInstance-6581][TaskInstance-null] - Begin to handle state event, 
TaskStateEvent(processInstanceId=6581, taskInstanceId=null, 
taskCode=10444805098592, status=TaskExecutionStatus{code=1, desc='running'}, 
type=TASK_RETRY, key=null, channel=null, context=null)
   
   ```
   
   
   We have currently identified that the NPE is caused by the fact that this 
piece of code returns null.
   
https://github.com/apache/dolphinscheduler/blob/5ec9085113c989c06a23d0e6820627b147dad15e/dolphinscheduler-service/src/main/java/org/apache/dolphinscheduler/service/process/ProcessServiceImpl.java#L331-L335
   
   
   
   
   @ruanwenjun PTAL. ref: https://github.com/apache/dolphinscheduler/pull/14544
   
   ### What you expected to happen
   
   no NPE error 
   
   ### How to reproduce
   
   A new cluster has been deployed through k8s with the dev branch, and MySQL 
is used as the database. The following steps were taken:
   
   1. Created a shell task with the script content 'exit 10086;' and set the 
task to retry once.
   2. Created a corresponding workflow with a serial wait execution policy.
   3. Created a scheduled task to execute every minute.
   
   After waiting for the task to execute several times, a NullPointerException 
error log appeared.
   
   ### Anything else
   
   _No response_
   
   ### Version
   
   dev
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: 
[email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to