Re: [I] [Bug] [MASTER] When workflow are frequently scheduled, the state synchronization mechanism has critical errors. [dolphinscheduler]

via GitHub Wed, 24 Jul 2024 23:26:39 -0700


github-actions[bot] commented on issue #16369:
URL: 
https://github.com/apache/dolphinscheduler/issues/16369#issuecomment-2249553292


   ### Search before asking
   
   - [X] I had searched in the 
[issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and 
found no similar issues.
   
   
   ### What happened
   
   State transition is a process, but now the notification mechanism between 
processes relies on transient state.
   State transition is a process, but currently, the notification mechanism 
between processes relies on instantaneous states.
   Here, take an example, if the workflow also has a timeout configuration, in 
this example, after the workflow times out, the next serially waiting workflow 
may be stuck.
   For example, let's imagine if the workflows also had timeout configurations, 
in this scenario, if a workflow times out, the next serially waiting workflow 
might get stuck.
   
   ```
   @startuml
   StateWheelExecuteThread --> WorkflowTimeoutStateEventHandler: send process A 
PROCESS_TIMEOUT event
   WorkflowTimeoutStateEventHandler -> WorkflowExecuteRunnable: processTimeout
   WorkflowExecuteRunnable --> WorkflowStateEventHandler: send process A STOP 
event
   WorkflowStateEventHandler -> WorkflowExecuteRunnable: endProcess
   WorkflowExecuteRunnable -> WorkflowExecuteRunnable:checkSerialProcess
   WorkflowExecuteRunnable --> ProcessServiceImpl: send process B 
RECOVER_SERIAL_WAIT command
   ProcessServiceImpl -> ProcessServiceImpl: STEP 1: handleCommand(if state of 
process A is RUNNING_PROCESS_STATE, state of process B will change back to 
SERIAL_WAIT)
   WorkflowStateEventHandler -> WorkflowStateEventHandler: STEP 2: update 
process A to STOP
   @enduml
   ```
   
![image](https://github.com/user-attachments/assets/f382fc32-1627-4abb-afad-141541e96514)
   
   Since the order of STEP 1 and STEP 2 cannot be guaranteed, all subsequent 
instances will be stuck in the "serial waiting" state.
   Since the order of STEP 1 and STEP 2 cannot be guaranteed now, all 
subsequent instances might get stuck in the "serial wait" state.
   
   
![image](https://github.com/user-attachments/assets/1148269f-a2f6-4c27-aade-bc3d66e2eb86)
   
   
   
   ### What you expected to happen
   
   Before restoring subsequent instances, you need to ensure that your status 
has been updated.
   Before resuming subsequent instances, you need to ensure that your own state 
has been fully updated.
   
   ### How to reproduce
   
   - Create Workflow A and Workflow B, both scheduled to run every minute.
   - In Workflow A, create a SUB_PROCESS task node that references Workflow B.
   - Online both Workflow A and Workflow B, and online the schedules.
   - Observe the state changes.
   
   ### Anything else
   
   _No response_
   
   ### Version
   
   dev
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [Bug] [MASTER] When workflow are frequently scheduled, the state synchronization mechanism has critical errors. [dolphinscheduler]

Reply via email to