hzyangkai opened a new issue, #12968:
URL: https://github.com/apache/dolphinscheduler/issues/12968

   ### Search before asking
   
   - [X] I had searched in the 
[issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and 
found no similar feature requirement.
   
   
   ### Description
   
   At present, no matter what type of task, when worker or master crashes, a 
new task is killed and restarted, which is unreasonable for tasks running on 
yarn。
   A reasonable form should be :
   1. if task running in external resource manager e.g. yarn , 
ShellCommandExecutor should exit immediately after submitting task and getting 
appid, then worker report appid to master,  at the same time,  worker starts to 
monitor the task status with appid
   2. when worker crashes,  master should send the same task with appid to 
another worker, then the worker starts to monitor the same task
   3. when master crashes , master should try rebuild the channel with the 
worker
   4. For tasks that run locally in the worker, keep the original logic
   
   ### Are you willing to submit a PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: 
[email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to