hzyangkai opened a new pull request, #13202:
URL: https://github.com/apache/dolphinscheduler/pull/13202

   Achieve the basic goals of the design document in the issue 
https://github.com/apache/dolphinscheduler/issues/12968
   
   When the worker crashes, the task running on yarn  keep running and the 
other tasks are killed and restarted.
   When the master crashes, all tasks keep running.
   When the master & worker crash,  the task running on yarn  keep running and 
the other tasks are killed and restarted.
   
   ## Purpose of the pull request
   
   
   ## Brief change log
   
   Adding two abstractions methods to the class AbstractTask.
   
   1. AbstractTask#oneAppIdPerTask: task confirmation generates only one appid. 
This method affects fault tolerance.
     1. If the task subclass implements oneAppIdPerTask=true, it can collect an 
appid and report it when the task starts. Then fault tolerance is performed 
based on the appid.  By default AbstractYarnTask#oneAppIdPerTask=true. 
FlinkStreamTask original implementation is not good enough, confusing the appid 
and jobid.  Therefore, FlinkStreamTask#oneAppIdPerTask=false, the 
implementation of FlinkStreamTask should be changed later to adjust 
oneAppIdPerTask=true
     2. If the task subclass does not implement oneAppIdPerTask, use the 
default setting oneAppIdPerTask=false. Appids will not be collected when the 
task starts.  Task will be killed remotely by ssh kill -9 processId and then 
restart a new task when worker crashes.
     
   2. AbstractTask#exitAfterSubmitTask: The submitting process exits 
immediately after a task is submitted. This method is used to optimize the 
submission method and is optional. The default value is false. Currently, only 
the spark cluster mode task is true.
   
   ## Verify this pull request
   
   Master crashes:
   1. when master crashes, and then restart , all types of tasks will rebuild 
channel to worker , keep running.
   
   Worker crashes:
   1. When the worker crashes, the task implementing oneAppIdPerTask=true could 
keep running.  Otherwise, it will be killed and restarted.
   
   Master & Worker crash
   1. When the master & worker crash, the task implementing 
oneAppIdPerTask=true could keep running.  Otherwise, it will be killed and 
restarted.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to