hzyangkai opened a new pull request, #13030: URL: https://github.com/apache/dolphinscheduler/pull/13030
all tasks keep running when master crashes, spark task in cluster mode keep running when worker crashes ## Purpose of the pull request Achieve the basic goals of the design document in the issue #12968 1. When the worker crashes, the task of type 3 keeps running and the task of type 1, 2 is killed and restarted as before 2. When the master crashes, all three types of tasks keep running 3. When the master & worker crash, the task of type 3 keeps running, and the task of type 1, 2 is killed and restarted as before currently , only adjust spark task in cluster mode to type 3 from type 2. ## Brief change log 1. WorkerTaskExecuteRunnable#execute : for the task of type 3, the submit process exit after the task is submitted;for the task of type 1, 2, the submit process exit after the task is finished 2. WorkerTaskExecuteRunnable#afterExecute: for the task of type 3, it reports the running status, along with the appid, then monitors app status on yarn, finally sends the final status to master when the app on yarn finished; for the task of type 1, 2, it reports the final status directly. ## Verify this pull request Manually verified the change by testing locally. ### master crashes 1. when master crashes, and then restart , all types of tasks will rebuild channel to worker , keep running. ### worker crashes 1. when kill worker using "dolphinscheduler-daemon.sh stop worker-server " and then restart worker using "dolphinscheduler-daemon.sh start worker-server", tasks of type 1 and 2 are killed by shutdown process of the worker,and then theses tasks are restarted. tasks of type3 (spark task in cluster mode) will keep running. 3. when kill worker using "kill -9 pid" and then restart worker using "dolphinscheduler-daemon.sh start worker-server", tasks of type 1 and 2 keep running , and then restart a new task instance, this is not reasonable, but is the same to the orginal logic of dophinscheduler. We should use scripts to stop tasks。 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
