hzyangkai opened a new issue, #12968: URL: https://github.com/apache/dolphinscheduler/issues/12968
### Search before asking - [X] I had searched in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and found no similar feature requirement. ### Description At present, no matter what type of task, when worker or master crashes, a new task is killed and restarted, which is unreasonable for tasks running on yarn。 A reasonable form should be : 1. if task running in external resource manager e.g. yarn , ShellCommandExecutor should exit immediately after submitting task and getting appid, then worker report appid to master, at the same time, worker starts to monitor the task status with appid 2. when worker crashes, master should send the same task with appid to another worker, then the worker starts to monitor the same task 3. when master crashes , master should try rebuild the channel with the worker 4. For tasks that run locally in the worker, keep the original logic ### Are you willing to submit a PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
