reele opened a new issue #6055:
URL: https://github.com/apache/dolphinscheduler/issues/6055


   **Describe the bug**
   When recovery a stopping instance, sub-process-task's state may be 'KILL', 
but the sub-process-instance is already submitted by 
RECOVER_TOLERANCE_FAULT_PROCESS command,
   the SubProcessTaskExecThread.waitTaskQuit() will [return 
directly](https://github.com/apache/dolphinscheduler/blob/e0eea995200f673d6406ec62c464c77f1d5b6171/dolphinscheduler-server/src/main/java/org/apache/dolphinscheduler/server/master/runner/SubProcessTaskExecThread.java#L128),
 and [set task state with sub-process's 
state](https://github.com/reele/dolphinscheduler/blob/3215cfb9f7c62bef7fa197b37ffc38cedd2c7ef5/dolphinscheduler-server/src/main/java/org/apache/dolphinscheduler/server/master/runner/SubProcessTaskExecThread.java#L66)
 (even if the sub-process is running), so the sub-process-task will ended with 
an unfinished state,
   so the parent thread MasterExecThread will fall into an endless-loop.
   
   
   **To Reproduce**
   This is a log example:
   In the beginning,
   process TRIGGER_D_DW_STS(id:3342, state:READY_STOP) has a sub-process-task 
STS_D_T88 (id:62930, state:KILL)
   sub-process-task STS_D_T88 (id:62930) point to process STS_D_T88 (id:3375, 
state:READY_STOP)
   
   at time 2021-07-31 19:19:55.010, sub-process-task STS_D_T88's state changed 
from KILL to READY_STOP, and then there is a deadloop forever.
   
   `[INFO] 2021-07-31 19:19:53.236 
org.apache.dolphinscheduler.server.master.runner.MasterSchedulerService:[153] - 
start master exec thread , split DAG ...
   [INFO] 2021-07-31 19:19:53.792 
org.apache.dolphinscheduler.server.master.runner.MasterSchedulerService:[145] - 
find one command: id: 9515, type: RECOVER_TOLERANCE_FAULT_PROCESS
   [INFO] 2021-07-31 19:19:53.809 
org.apache.dolphinscheduler.server.master.runner.MasterExecThread:[242] - 
process 3337 start to complement 2021-07-30 00:00:00 data
   [INFO] 2021-07-31 19:19:53.844 
org.apache.dolphinscheduler.server.master.runner.MasterExecThread:[315] - 
prepare process :3337 end
   [INFO] 2021-07-31 19:19:53.919 
org.apache.dolphinscheduler.server.master.runner.MasterExecThread:[792] - add 
task to stand by list: TRIGGER_D_DW_STS
   [INFO] 2021-07-31 19:19:53.933 
org.apache.dolphinscheduler.server.master.runner.MasterExecThread:[805] - 
remove task from stand by list: TRIGGER_D_DW_STS
   [INFO] 2021-07-31 19:19:53.945 
org.apache.dolphinscheduler.service.process.ProcessService:[845] - start submit 
task : TRIGGER_D_DW_STS, instance id:3337, state: READY_STOP
   [INFO] 2021-07-31 19:19:53.950 
org.apache.dolphinscheduler.service.process.ProcessService:[858] - end submit 
task to db successfully:TRIGGER_D_DW_STS state:KILL complete, instance id:3337 
state: READY_STOP  
   [INFO] 2021-07-31 19:19:53.959 
org.apache.dolphinscheduler.server.master.runner.SubProcessTaskExecThread:[121] 
- wait sub work flow: TRIGGER_D_DW_STS complete
   [INFO] 2021-07-31 19:19:53.959 
org.apache.dolphinscheduler.server.master.runner.SubProcessTaskExecThread:[124] 
- sub work flow task TRIGGER_D_DW_STS already complete. task state:KILL, parent 
work flow instance state:READY_STOP
   [INFO] 2021-07-31 19:19:53.963 
org.apache.dolphinscheduler.server.master.runner.MasterSchedulerService:[153] - 
start master exec thread , split DAG ...
   [INFO] 2021-07-31 19:19:53.969 
org.apache.dolphinscheduler.server.master.runner.SubProcessTaskExecThread:[71] 
- subflow task :TRIGGER_D_DW_STS id:62897, process id:3337, exec thread 
completed 
   [INFO] 2021-07-31 19:19:53.975 
org.apache.dolphinscheduler.server.master.runner.MasterSchedulerService:[145] - 
find one command: id: 9516, type: RECOVER_TOLERANCE_FAULT_PROCESS
   [INFO] 2021-07-31 19:19:53.989 
org.apache.dolphinscheduler.server.master.runner.MasterExecThread:[315] - 
prepare process :3342 end
   [INFO] 2021-07-31 19:19:53.994 
org.apache.dolphinscheduler.server.master.runner.MasterSchedulerService:[153] - 
start master exec thread , split DAG ...
   [INFO] 2021-07-31 19:19:54.001 
org.apache.dolphinscheduler.server.master.runner.MasterExecThread:[792] - add 
task to stand by list: STS_D_T88
   [INFO] 2021-07-31 19:19:54.002 
org.apache.dolphinscheduler.server.master.runner.MasterSchedulerService:[145] - 
find one command: id: 9517, type: RECOVER_TOLERANCE_FAULT_PROCESS
   [INFO] 2021-07-31 19:19:54.002 
org.apache.dolphinscheduler.server.master.runner.MasterExecThread:[805] - 
remove task from stand by list: STS_D_T88
   [INFO] 2021-07-31 19:19:54.019 
org.apache.dolphinscheduler.service.process.ProcessService:[845] - start submit 
task : STS_D_T88, instance id:3342, state: READY_STOP
   [INFO] 2021-07-31 19:19:54.023 
org.apache.dolphinscheduler.server.master.runner.MasterExecThread:[315] - 
prepare process :3375 end
   [INFO] 2021-07-31 19:19:54.025 
org.apache.dolphinscheduler.service.process.ProcessService:[858] - end submit 
task to db successfully:STS_D_T88 state:KILL complete, instance id:3342 state: 
READY_STOP  
   [INFO] 2021-07-31 19:19:54.030 
org.apache.dolphinscheduler.server.master.runner.MasterSchedulerService:[153] - 
start master exec thread , split DAG ...
   [INFO] 2021-07-31 19:19:54.037 
org.apache.dolphinscheduler.server.master.runner.SubProcessTaskExecThread:[121] 
- wait sub work flow: STS_D_T88 complete
   [INFO] 2021-07-31 19:19:54.038 
org.apache.dolphinscheduler.server.master.runner.SubProcessTaskExecThread:[124] 
- sub work flow task STS_D_T88 already complete. task state:KILL, parent work 
flow instance state:READY_STOP
   [INFO] 2021-07-31 19:19:54.042 
org.apache.dolphinscheduler.server.master.runner.MasterExecThread:[792] - add 
task to stand by list: CDB_T88_EMPLY_BIZ_STAT_SUM_CDM_1
   [INFO] 2021-07-31 19:19:54.044 
org.apache.dolphinscheduler.server.master.runner.MasterExecThread:[805] - 
remove task from stand by list: CDB_T88_EMPLY_BIZ_STAT_SUM_CDM_1
   [INFO] 2021-07-31 19:19:54.053 
org.apache.dolphinscheduler.server.master.runner.DependentTaskExecThread:[76] - 
dependent task start
   [INFO] 2021-07-31 19:19:54.058 
org.apache.dolphinscheduler.service.process.ProcessService:[845] - start submit 
task : CDB_T88_EMPLY_BIZ_STAT_SUM_CDM_1, instance id:3375, state: READY_STOP
   [INFO] 2021-07-31 19:19:54.060 
org.apache.dolphinscheduler.server.master.runner.SubProcessTaskExecThread:[71] 
- subflow task :STS_D_T88 id:62930, process id:3342, exec thread completed 
   [INFO] 2021-07-31 19:19:54.063 
org.apache.dolphinscheduler.service.process.ProcessService:[858] - end submit 
task to db successfully:CDB_T88_EMPLY_BIZ_STAT_SUM_CDM_1 state:KILL complete, 
instance id:3375 state: READY_STOP  
   [INFO] 2021-07-31 19:19:54.063 
org.apache.dolphinscheduler.server.master.runner.MasterExecThread:[315] - 
prepare process :3375 end
   [INFO] 2021-07-31 19:19:54.081 
org.apache.dolphinscheduler.server.master.runner.MasterExecThread:[498] - task 
CDB_T88_EMPLY_BIZ_STAT_SUM_CDM_1 stopped, the state is KILL
   [INFO] 2021-07-31 19:19:54.091  - [taskAppId=TASK-7187-3375-63133]:[133] - 
wait depend task : CDB_T88_EMPLY_BIZ_STAT_SUM_CDM_1 complete
   [INFO] 2021-07-31 19:19:54.948 
org.apache.dolphinscheduler.server.master.runner.MasterExecThread:[864] - task 
:TRIGGER_D_DW_STS, id:62897 complete, state is READY_STOP 
   [INFO] 2021-07-31 19:19:55.010 
org.apache.dolphinscheduler.server.master.runner.MasterExecThread:[864] - task 
:STS_D_T88, id:62930 complete, state is READY_STOP 
   [INFO] 2021-07-31 19:19:55.053 
org.apache.dolphinscheduler.server.master.runner.MasterExecThread:[864] - task 
:CDB_T88_EMPLY_BIZ_STAT_SUM_CDM_1, id:63133 complete, state is KILL
   [ERROR] 2021-07-31 19:19:55.088 
org.apache.dolphinscheduler.common.utils.DateUtils:[131] - error while parse 
date:null
   java.lang.NullPointerException: text
        at java.util.Objects.requireNonNull(Objects.java:228)
        at java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1848)
        at java.time.LocalDateTime.parse(LocalDateTime.java:492)
        at 
org.apache.dolphinscheduler.common.utils.DateUtils.parse(DateUtils.java:128)
        at 
org.apache.dolphinscheduler.common.utils.DateUtils.stringToDate(DateUtils.java:144)
        at 
org.apache.dolphinscheduler.common.utils.DateUtils.getScheduleDate(DateUtils.java:240)
        at 
org.apache.dolphinscheduler.server.master.runner.MasterExecThread.isComplementEnd(MasterExecThread.java:749)
        at 
org.apache.dolphinscheduler.server.master.runner.MasterExecThread.getProcessInstanceState(MasterExecThread.java:695)
        at 
org.apache.dolphinscheduler.server.master.runner.MasterExecThread.updateProcessInstanceState(MasterExecThread.java:762)
        at 
org.apache.dolphinscheduler.server.master.runner.MasterExecThread.runProcess(MasterExecThread.java:922)
        at 
org.apache.dolphinscheduler.server.master.runner.MasterExecThread.executeProcess(MasterExecThread.java:200)
        at 
org.apache.dolphinscheduler.server.master.runner.MasterExecThread.run(MasterExecThread.java:181)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
   [INFO] 2021-07-31 19:19:55.088 
org.apache.dolphinscheduler.server.master.runner.MasterExecThread:[764] - work 
flow process instance [id: 3375, name:STS_D_T88-1-1627719975621], state change 
from READY_STOP to STOP, cmd type: RECOVER_TOLERANCE_FAULT_PROCESS
   [INFO] 2021-07-31 19:19:55.102 
org.apache.dolphinscheduler.server.master.runner.MasterExecThread:[925] - 
process:3375 end, state :STOP
   [INFO] 2021-07-31 19:19:55.959 
org.apache.dolphinscheduler.server.master.runner.MasterExecThread:[864] - task 
:TRIGGER_D_DW_STS, id:62897 complete, state is READY_STOP 
   [INFO] 2021-07-31 19:19:56.017 
org.apache.dolphinscheduler.server.master.runner.MasterExecThread:[864] - task 
:STS_D_T88, id:62930 complete, state is READY_STOP 
   [INFO] 2021-07-31 19:19:56.060 
org.apache.dolphinscheduler.server.master.runner.MasterExecThread:[764] - work 
flow process instance [id: 3375, name:STS_D_T88-1-1627719975621], state change 
from READY_STOP to STOP, cmd type: RECOVER_TOLERANCE_FAULT_PROCESS
   [INFO] 2021-07-31 19:19:56.073 
org.apache.dolphinscheduler.server.master.runner.MasterExecThread:[925] - 
process:3375 end, state :STOP
   [INFO] 2021-07-31 19:19:56.968 
org.apache.dolphinscheduler.server.master.runner.MasterExecThread:[864] - task 
:TRIGGER_D_DW_STS, id:62897 complete, state is READY_STOP 
   [INFO] 2021-07-31 19:19:57.024 
org.apache.dolphinscheduler.server.master.runner.MasterExecThread:[864] - task 
:STS_D_T88, id:62930 complete, state is READY_STOP 
   [INFO] 2021-07-31 19:19:57.978 
org.apache.dolphinscheduler.server.master.runner.MasterExecThread:[864] - task 
:TRIGGER_D_DW_STS, id:62897 complete, state is READY_STOP 
   [INFO] 2021-07-31 19:19:58.032 
org.apache.dolphinscheduler.server.master.runner.MasterExecThread:[864] - task 
:STS_D_T88, id:62930 complete, state is READY_STOP 
   [INFO] 2021-07-31 19:19:58.988 
org.apache.dolphinscheduler.server.master.runner.MasterExecThread:[864] - task 
:TRIGGER_D_DW_STS, id:62897 complete, state is READY_STOP 
   [INFO] 2021-07-31 19:19:59.040 
org.apache.dolphinscheduler.server.master.runner.MasterExecThread:[864] - task 
:STS_D_T88, id:62930 complete, state is READY_STOP 
   [INFO] 2021-07-31 19:19:59.998 
org.apache.dolphinscheduler.server.master.runner.MasterExecThread:[864] - task 
:TRIGGER_D_DW_STS, id:62897 complete, state is READY_STOP 
   [INFO] 2021-07-31 19:20:00.047 
org.apache.dolphinscheduler.server.master.runner.MasterExecThread:[864] - task 
:STS_D_T88, id:62930 complete, state is READY_STOP 
   [INFO] 2021-07-31 19:20:01.007 
org.apache.dolphinscheduler.server.master.runner.MasterExecThread:[864] - task 
:TRIGGER_D_DW_STS, id:62897 complete, state is READY_STOP 
   `
   
   **Expected behavior**
   I think sub-process and dependent tasks should always submit with the 
SUBMITTED_SUCCESS status. at 
[ProcessService.getSubmitTaskState](https://github.com/apache/dolphinscheduler/blob/e0eea995200f673d6406ec62c464c77f1d5b6171/dolphinscheduler-service/src/main/java/org/apache/dolphinscheduler/service/process/ProcessService.java#L1280)
   
   **Screenshots**
   If applicable, add screenshots to help explain your problem.
   
   
   **Which version of Dolphin Scheduler:**
    -[any]
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to