CloudSen opened a new issue, #16369:
URL: https://github.com/apache/dolphinscheduler/issues/16369

   ### Search before asking
   
   - [X] I had searched in the 
[issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and 
found no similar issues.
   
   
   ### What happened
   
   I think the state synchronization mechanism has critical errors.
   状态的转换是一个过程,但是现在流程之间的通知机制依赖瞬时状态。
   State transition is a process, but currently, the notification mechanism 
between processes relies on instantaneous states.
   在此,举一个例子,如果工作流也有超时配置,这个例子里,工作流超时后,下一个串行等待的工作流可能会卡住。
   For example, let's imagine if the workflows also had timeout 
configurations,in this scenario, if a workflow times out, the next serially 
waiting workflow might get stuck.
   
   ```
   @startuml
   StateWheelExecuteThread --> WorkflowTimeoutStateEventHandler: send process A 
PROCESS_TIMEOUT event
   WorkflowTimeoutStateEventHandler -> WorkflowExecuteRunnable: processTimeout
   WorkflowExecuteRunnable --> WorkflowStateEventHandler: send process A STOP 
event
   WorkflowStateEventHandler -> WorkflowExecuteRunnable: endProcess
   WorkflowExecuteRunnable -> WorkflowExecuteRunnable:checkSerialProcess
   WorkflowExecuteRunnable --> ProcessServiceImpl: send process B 
RECOVER_SERIAL_WAIT command
   ProcessServiceImpl -> ProcessServiceImpl: STEP 1: handleCommand(if state of 
process A is RUNNING_PROCESS_STATE, state of proces B will change back to 
SERIAL_WAIT)
   WorkflowStateEventHandler -> WorkflowStateEventHandler: STEP 2: update 
process A to STOP
   @enduml
   ```
   
![image](https://github.com/user-attachments/assets/f382fc32-1627-4abb-afad-141541e96514)
   
   由于STEP 1和STEP 2的顺序无法保证,会导致后续所有实例都卡在“串行等待”状态。
   Since the order of STEP 1 and STEP 2 cannot be guaranteed now, all 
subsequent instances might get stuck in the "serial wait" state.
   
   
   ### What you expected to happen
   
   恢复后续实例前,需要保证自己的状态更新完毕
   Before resuming subsequent instances, you need to ensure that your own state 
has been fully updated.
   
   ### How to reproduce
   
   - Create Workflow A and Workflow B, both scheduled to run every minute, and 
use Serial Wait execution type.
   - In Workflow A, create a SUB_PROCESS task node that references Workflow B.
   - Online both Workflow A and Workflow B, and online the schedules.
   - Observe the state changes.
   
   ### Anything else
   
   _No response_
   
   ### Version
   
   3.1.1
   
   ### Are you willing to submit PR?
   
   - [] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: 
[email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to