github-actions[bot] commented on issue #14428:
URL: 
https://github.com/apache/dolphinscheduler/issues/14428#issuecomment-1614023837

   ### Search before asking
   
   - [X] I had searched in the 
[issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and 
found no similar issues.
   
   
   ### What happened
   
   ### Brief
   w2#task1 depend on w1#task2 to work with timeout fail turned out (e.g. 
timeout after 1 min), when w1#task1 is not executed (out of dependent interval) 
, w2#task1 failed because of timeout,  the failure information will not be 
reported to w2, so the w2 stuck on *running*, which is not prefered, since w2 
is not even aware of the failure of task1.
   
   Timeout failure somehow doesn't change taskStateEvent, then in the 
TaskStateEventHandler, this situation falls into a error condition logging as:
   `        if (task.getState().isFinished()
                   && (taskStateEvent.getStatus() != null && 
taskStateEvent.getStatus().isRunning())) {
               String errorMessage = String.format(
                       "The current task instance state is %s, but the task 
state event status is %s, so the task state event will be ignored",
                       task.getState().name(),
                       taskStateEvent.getStatus().name());
               log.warn(errorMessage);
               throw new StateEventHandleError(errorMessage);
           }
   `
   The w2 will not response, so stuck on running, this could also lead to 
further misleading situation if not handled.
   
   ### Replay
   1. Create a workflow w1 with a echo shell task w1#task1
   2. Create a workflow w2 with a w2#task1 dependent on w1#task1, and a echo 
shell task w2#task2 following w2#task1.
   3. Set w2#task1 retry time as 0, turn on timeout fail (超时失败), w2#task1 
depend on w1#task1, dependent relation is AND.
   
![image](https://github.com/apache/dolphinscheduler/assets/35411935/cd97e152-a7e0-43bd-8ccf-e928362529d4)
   4. Start w2#task1 directly.
   
   ### Version related
   Since 3.0.x to dev, not modified.
   
   ### What you expected to happen
   
   w2 should know task1 is failed and workflowExecuteRunnable.taskFinished 
should be called.
   i.e.
   If w2#task1 failed, w2 should be failed *instead of* kept in running 
situation. Workflow should know a task a failed and use 
workflowExecuteRunnable.taskFinished to handle further operations, like task 
retry, failover, workflow status change etc.
   
   ### How to reproduce
   
   ### Replay
   1. Create a workflow w1 with a echo shell task w1#task1
   2. Create a workflow w2 with a w2#task1 dependent on w1#task1, and a echo 
shell task w2#task2 following w2#task1.
   3. Set w2#task1 retry time as 0, turn on timeout fail (超时失败), w2#task1 
depend on w1#task1, dependent relation is AND.
   
![image](https://github.com/apache/dolphinscheduler/assets/35411935/cd97e152-a7e0-43bd-8ccf-e928362529d4)
   4. Start w2#task1 directly.
   
   ### Anything else
   
   [INFO] 2023-06-21 13:47:50.588 +0800 
org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThreadPool:[97] 
- [WorkflowInstance-0][TaskInstance-15] - Submit state event success, 
stateEvent: TaskStateEvent(processInstanceId=15, taskInstanceId=21, 
taskCode=9960682788640, status=TaskExecutionStatus{code=1, desc='running'}, 
type=TASK_STATE_CHANGE, key=null, channel=null, context=null)
   [INFO] 2023-06-21 13:47:50.669 +0800 
org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteRunnable:[287] 
- [WorkflowInstance-15][TaskInstance-21] - Begin to handle state event, 
TaskStateEvent(processInstanceId=15, taskInstanceId=21, taskCode=9960682788640, 
status=TaskExecutionStatus{code=1, desc='running'}, type=TASK_STATE_CHANGE, 
key=null, channel=null, context=null)
   [INFO] 2023-06-21 13:47:50.669 +0800 
org.apache.dolphinscheduler.server.master.event.TaskStateEventHandler:[57] - 
[WorkflowInstance-15][TaskInstance-21] - Handle task instance state event, the 
current task instance state TaskExecutionStatus{code=1, desc='running'} will be 
changed to TaskExecutionStatus{code=1, desc='running'}
   [INFO] 2023-06-21 13:47:55.588 +0800 
org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThreadPool:[97] 
- [WorkflowInstance-0][TaskInstance-15] - Submit state event success, 
stateEvent: TaskStateEvent(processInstanceId=15, taskInstanceId=21, 
taskCode=9960682788640, status=TaskExecutionStatus{code=1, desc='running'}, 
type=TASK_STATE_CHANGE, key=null, channel=null, context=null)
   [INFO] 2023-06-21 13:47:55.681 +0800 
org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteRunnable:[287] 
- [WorkflowInstance-15][TaskInstance-21] - Begin to handle state event, 
TaskStateEvent(processInstanceId=15, taskInstanceId=21, taskCode=9960682788640, 
status=TaskExecutionStatus{code=1, desc='running'}, type=TASK_STATE_CHANGE, 
key=null, channel=null, context=null)
   [INFO] 2023-06-21 13:47:55.681 +0800 
org.apache.dolphinscheduler.server.master.event.TaskStateEventHandler:[57] - 
[WorkflowInstance-15][TaskInstance-21] - Handle task instance state event, the 
current task instance state TaskExecutionStatus{code=1, desc='running'} will be 
changed to TaskExecutionStatus{code=1, desc='running'}
   [INFO] 2023-06-21 13:47:58.957 +0800 
org.apache.dolphinscheduler.server.master.task.MasterHeartBeatTask:[70] - 
[WorkflowInstance-0][TaskInstance-0] - Success write master heartBeatInfo into 
registry, masterRegistryPath: /nodes/master/172.22.21.6:5678, heartBeatInfo: 
{"startupTime":1687325447045,"reportTime":1687326478929,"cpuUsage":0.0,"memoryUsage":0.41,"loadAverage":0.0,"availablePhysicalMemorySize":9.06,"maxCpuloadAvg":32.0,"reservedMemory":0.3,"diskAvailable":1002.83,"processId":3160}
   [INFO] 2023-06-21 13:48:00.589 +0800 
org.apache.dolphinscheduler.server.master.runner.StateWheelExecuteThread:[281] 
- [WorkflowInstance-15][TaskInstance-15] - Task instance is timeout, adding 
task timeout event and remove the check
   [INFO] 2023-06-21 13:48:00.589 +0800 
org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThreadPool:[97] 
- [WorkflowInstance-15][TaskInstance-15] - Submit state event success, 
stateEvent: TaskStateEvent(processInstanceId=15, taskInstanceId=21, 
taskCode=9960682788640, status=null, type=TASK_TIMEOUT, key=null, channel=null, 
context=null)
   [INFO] 2023-06-21 13:48:00.589 +0800 
org.apache.dolphinscheduler.server.master.runner.StateWheelExecuteThread:[281] 
- [WorkflowInstance-15][TaskInstance-15] - Task instance is timeout, adding 
task timeout event and remove the check
   [INFO] 2023-06-21 13:48:00.589 +0800 
org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThreadPool:[97] 
- [WorkflowInstance-15][TaskInstance-15] - Submit state event success, 
stateEvent: TaskStateEvent(processInstanceId=15, taskInstanceId=21, 
taskCode=9960682788640, status=null, type=TASK_TIMEOUT, key=null, channel=null, 
context=null)
   [INFO] 2023-06-21 13:48:00.589 +0800 
org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThreadPool:[97] 
- [WorkflowInstance-0][TaskInstance-15] - Submit state event success, 
stateEvent: TaskStateEvent(processInstanceId=15, taskInstanceId=21, 
taskCode=9960682788640, status=TaskExecutionStatus{code=1, desc='running'}, 
type=TASK_STATE_CHANGE, key=null, channel=null, context=null)
   [INFO] 2023-06-21 13:48:00.589 +0800 
org.apache.dolphinscheduler.server.master.runner.StateWheelExecuteThread:[157] 
- [WorkflowInstance-15][TaskInstance-15] - Workflow instance 15 timeout, adding 
timeout event
   [INFO] 2023-06-21 13:48:00.589 +0800 
org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThreadPool:[97] 
- [WorkflowInstance-15][TaskInstance-15] - Submit state event success, 
stateEvent: WorkflowStateEvent(processInstanceId=15, taskInstanceId=null, 
status=null, type=PROCESS_TIMEOUT, key=null, channel=null, context=null)
   [INFO] 2023-06-21 13:48:00.589 +0800 
org.apache.dolphinscheduler.server.master.runner.StateWheelExecuteThread:[160] 
- [WorkflowInstance-15][TaskInstance-15] - Workflow instance timeout, added 
timeout event
   [INFO] 2023-06-21 13:48:00.593 +0800 
org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteRunnable:[287] 
- [WorkflowInstance-15][TaskInstance-21] - Begin to handle state event, 
TaskStateEvent(processInstanceId=15, taskInstanceId=21, taskCode=9960682788640, 
status=null, type=TASK_TIMEOUT, key=null, channel=null, context=null)
   [INFO] 2023-06-21 13:48:00.593 +0800 
org.apache.dolphinscheduler.server.master.event.TaskTimeoutStateEventHandler:[53]
 - [WorkflowInstance-15][TaskInstance-21] - Handle task instance state timout 
event, taskInstanceId: 21
   [INFO] 2023-06-21 13:48:00.594 +0800 TaskLogLogger-class 
org.apache.dolphinscheduler.server.master.runner.task.DependentTaskProcessor:[151]
 - [WorkflowInstance-15][TaskInstance-21] - dependent taskInstanceId: 21 
timeout, taskName: w2, strategy: warnfailed
   [INFO] 2023-06-21 13:48:00.652 +0800 
org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteRunnable:[287] 
- [WorkflowInstance-15][TaskInstance-21] - Begin to handle state event, 
TaskStateEvent(processInstanceId=15, taskInstanceId=21, taskCode=9960682788640, 
status=null, type=TASK_TIMEOUT, key=null, channel=null, context=null)
   [INFO] 2023-06-21 13:48:00.652 +0800 
org.apache.dolphinscheduler.server.master.event.TaskTimeoutStateEventHandler:[53]
 - [WorkflowInstance-15][TaskInstance-21] - Handle task instance state timout 
event, taskInstanceId: 21
   [INFO] 2023-06-21 13:48:00.682 +0800 
org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteRunnable:[287] 
- [WorkflowInstance-15][TaskInstance-21] - Begin to handle state event, 
TaskStateEvent(processInstanceId=15, taskInstanceId=21, taskCode=9960682788640, 
status=TaskExecutionStatus{code=1, desc='running'}, type=TASK_STATE_CHANGE, 
key=null, channel=null, context=null)
   [INFO] 2023-06-21 13:48:00.682 +0800 
org.apache.dolphinscheduler.server.master.event.TaskStateEventHandler:[57] - 
[WorkflowInstance-15][TaskInstance-21] - Handle task instance state event, the 
current task instance state TaskExecutionStatus{code=6, desc='failure'} will be 
changed to TaskExecutionStatus{code=1, desc='running'}
   [WARN] 2023-06-21 13:48:00.682 +0800 
org.apache.dolphinscheduler.server.master.event.TaskStateEventHandler:[68] - 
[WorkflowInstance-15][TaskInstance-21] - The current task instance state is 
TaskExecutionStatus{code=6, desc='failure'}, but the task state event status is 
TaskExecutionStatus{code=1, desc='running'}, so the task state event will be 
ignored
   [ERROR] 2023-06-21 13:48:00.682 +0800 
org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteRunnable:[292] 
- [WorkflowInstance-15][TaskInstance-21] - State event handle error, will 
remove this event: TaskStateEvent(processInstanceId=15, taskInstanceId=21, 
taskCode=9960682788640, status=TaskExecutionStatus{code=1, desc='running'}, 
type=TASK_STATE_CHANGE, key=null, channel=null, context=null)
   org.apache.dolphinscheduler.server.master.event.StateEventHandleError: The 
current task instance state is TaskExecutionStatus{code=6, desc='failure'}, but 
the task state event status is TaskExecutionStatus{code=1, desc='running'}, so 
the task state event will be ignored
           at 
org.apache.dolphinscheduler.server.master.event.TaskStateEventHandler.handleStateEvent(TaskStateEventHandler.java:69)
                                at 
org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteRunnable.handleEvents(WorkflowExecuteRunnable.java:288)
           at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
           at java.util.concurrent.FutureTask.run(FutureTask.java:266)
           at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:750)
   [INFO] 2023-06-21 13:48:01.683 +0800 
org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteRunnable:[287] 
- [WorkflowInstance-15][TaskInstance-null] - Begin to handle state event, 
WorkflowStateEvent(processInstanceId=15, taskInstanceId=null, status=null, 
type=PROCESS_TIMEOUT, key=null, channel=null, context=null)
   [INFO] 2023-06-21 13:48:01.683 +0800 
org.apache.dolphinscheduler.server.master.event.WorkflowTimeoutStateEventHandler:[35]
 - [WorkflowInstance-15][TaskInstance-null] - Handle workflow instance timeout 
event
   [INFO] 2023-06-21 13:48:08.976 +0800 
org.apache.dolphinscheduler.server.master.task.MasterHeartBeatTask:[70] - 
[WorkflowInstance-0][TaskInstance-0] - Success write master heartBeatInfo into 
registry, masterRegistryPath: /nodes/master/172.22.21.6:5678, heartBeatInfo: 
{"startupTime":1687325447045,"reportTime":1687326488957,"cpuUsage":0.0,"memoryUsage":0.41,"loadAverage":0.0,"availablePhysicalMemorySize":9.05,"maxCpuloadAvg":32.0,"reservedMemory":0.3,"diskAvailable":1002.83,"processId":3160}
   [INFO] 2023-06-21 13:48:14.507 +0800 
org.apache.dolphinscheduler.service.log.LoggerRequestProcessor:[79] - 
[WorkflowInstance-0][TaskInstance-0] - received command : Command 
[type=ROLL_VIEW_LOG_REQUEST, opaque=1252, bodyLen=132]
   [INFO] 2023-06-21 13:48:18.994 +0800 
org.apache.dolphinscheduler.server.master.task.MasterHeartBeatTask:[70] - 
[WorkflowInstance-0][TaskInstance-0] - Success write master heartBeatInfo into 
registry, masterRegistryPath: /nodes/master/172.22.21.6:5678, heartBeatInfo: 
{"startupTime":1687325447045,"reportTime":1687326498976,"cpuUsage":0.0,"memoryUsage":0.42,"loadAverage":0.0,"availablePhysicalMemorySize":9.05,"maxCpuloadAvg":32.0,"reservedMemory":0.3,"diskAvailable":1002.83,"processId":3160}
   
   ### Version
   
   dev
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to