sketchmind opened a new issue, #16606:
URL: https://github.com/apache/dolphinscheduler/issues/16606

   ### Search before asking
   
   - [X] I had searched in the 
[issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and 
found no similar feature requirement.
   
   
   ### Description
   
   
   
![image](https://github.com/user-attachments/assets/11173909-c2c4-4d08-83cf-7dc59cfa7989)
   
   In DolphinScheduler's scheduling strategy where workflows continue after 
task failures, we encountered a limitation with the "Recovery Failed" feature. 
Specifically, if a task within a workflow fails, and other tasks are still 
running for a period of time, the "Recovery Failed" option becomes unavailable. 
We can only recover the workflow after the entire workflow fails, leading to 
delays in completing the failed task and its subsequent tasks.
   
   For example, in the attached scenario (see image), Task **B1** has failed, 
while other tasks like **A1** (which **Workflow2** depends on) continue 
running. If we wait for **Workflow1** to fail before recovering the failed task 
(**B1**), **B1**'s completion will be delayed. However, if we terminate 
**Workflow1** immediately and then recover it, the dependent workflow 
(**Workflow2**) would unnecessarily fail due to **A1** being killed, requiring 
us to recover **Workflow2** as well.
   
   **Proposed Feature:**
   We suggest adding a feature that allows us to recover failed tasks within a 
running workflow. This would provide a way to proactively recover tasks like 
**B1** before the entire workflow fails, giving workflows that would otherwise 
fail the opportunity to complete successfully.
   
   This enhancement could save time and prevent cascading failures in dependent 
workflows. It would be particularly useful in scenarios where we can foresee a 
task's failure leading to the workflow’s eventual failure.
   
   ### Are you willing to submit a PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: 
[email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to