hzyangkai opened a new pull request, #13202: URL: https://github.com/apache/dolphinscheduler/pull/13202
Achieve the basic goals of the design document in the issue https://github.com/apache/dolphinscheduler/issues/12968 When the worker crashes, the task running on yarn keep running and the other tasks are killed and restarted. When the master crashes, all tasks keep running. When the master & worker crash, the task running on yarn keep running and the other tasks are killed and restarted. ## Purpose of the pull request ## Brief change log Adding two abstractions methods to the class AbstractTask. 1. AbstractTask#oneAppIdPerTask: task confirmation generates only one appid. This method affects fault tolerance. 1. If the task subclass implements oneAppIdPerTask=true, it can collect an appid and report it when the task starts. Then fault tolerance is performed based on the appid. By default AbstractYarnTask#oneAppIdPerTask=true. FlinkStreamTask original implementation is not good enough, confusing the appid and jobid. Therefore, FlinkStreamTask#oneAppIdPerTask=false, the implementation of FlinkStreamTask should be changed later to adjust oneAppIdPerTask=true 2. If the task subclass does not implement oneAppIdPerTask, use the default setting oneAppIdPerTask=false. Appids will not be collected when the task starts. Task will be killed remotely by ssh kill -9 processId and then restart a new task when worker crashes. 2. AbstractTask#exitAfterSubmitTask: The submitting process exits immediately after a task is submitted. This method is used to optimize the submission method and is optional. The default value is false. Currently, only the spark cluster mode task is true. ## Verify this pull request Master crashes: 1. when master crashes, and then restart , all types of tasks will rebuild channel to worker , keep running. Worker crashes: 1. When the worker crashes, the task implementing oneAppIdPerTask=true could keep running. Otherwise, it will be killed and restarted. Master & Worker crash 1. When the master & worker crash, the task implementing oneAppIdPerTask=true could keep running. Otherwise, it will be killed and restarted. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
