weeway commented on issue #10443: URL: https://github.com/apache/dolphinscheduler/issues/10443#issuecomment-1170705973
> I think this is the bug of 2.0.5. It has been solved in #10541 and will be released in 2.0.6 I switch to the branch `2.0.6-prepare` and test it. The problem still exists. ### The topology There are three process pA, pB and pC. - pA has a task pAt1 - pB has a dependent task pBt1 depending on pAt1 - pC has a task pCt1 that is long running task ### How to reproduce it? - start pB - start pC with failedStrategy `End` - stop pC - start pA and pA finished - finally, In the UI you can see that the dependent task pBt1 always running The detail to start pC: <img width="836" alt="image" src="https://user-images.githubusercontent.com/12637868/176584789-af83ecfc-9606-4c77-b706-ff7caa7829ad.png"> ### The reason All `WorkflowExecuteThread` instances share the same `taskRetryCheckList `. When pC starting with failedStartegy `End`, you stop it then the `taskRetryCheckList` be all cleared. **The correct logic is only clearing the taskinstance belong to pC**. The critical Code in `WorkflowExecuteThread`: `org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread#taskFinished` <img width="872" alt="image" src="https://user-images.githubusercontent.com/12637868/176585635-9b49c84d-ddd9-472c-a797-a5cc6d2151e9.png"> `org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread#killAllTasks` <img width="626" alt="image" src="https://user-images.githubusercontent.com/12637868/176585703-cc267b8a-dec9-476e-8015-7aa8463345c5.png"> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
