hzyangkai commented on PR #13030:
URL:
https://github.com/apache/dolphinscheduler/pull/13030#issuecomment-1354155901
Adding two abstractions methods to the class AbstractTask.
1. AbstractTask#oneAppIdPerTask: task confirmation generates only one appid.
This method affects fault tolerance.
1. If the task subclass implements oneAppIdPerTask=true, it can collect an
appid and report it when the task starts. Then fault tolerance is performed
based on the appid. By default AbstractYarnTask#oneAppIdPerTask=true.
FlinkStreamTask original implementation is not good enough, confusing the appid
and jobid. Therefore, FlinkStreamTask#oneAppIdPerTask=false, the
implementation of FlinkStreamTask should be changed later to adjust
oneAppIdPerTask=true
2. If the task subclass does not implement oneAppIdPerTask, use the
default setting oneAppIdPerTask=false. Appids will not be collected when the
task starts. Task will be killed remotely by ssh kill -9 processId and then
restart a new task when worker crashes.
2. AbstractTask#exitAfterSubmitTask: The submitting process exits
immediately after a task is submitted. This method is used to optimize the
submission method and is optional. The default value is false. Currently, only
the spark cluster mode task is true.
Test:
Master crashes:
1. when master crashes, and then restart , all types of tasks will rebuild
channel to worker , keep running.
Worker crashes:
1. When the worker crashes, the task implementing oneAppIdPerTask=true could
keep running. Otherwise, it will be killed and restarted.
Master & Worker crash
1. When the master & worker crash, the task implementing
oneAppIdPerTask=true could keep running. Otherwise, it will be killed and
restarted.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]