legen618 opened a new issue #3575:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3575


   # Description of the feature
   
   In general, mandatory success is divided into two sub-functions:
   
   1. Add a new task status called `forced_success`. For tasks that fail in the 
workflow, you can manually modify their status to `forced_successr`. If the 
workflow is still running, subsequent dependencies can continue to execute.
   2. Add a new execution command called `resume_from_forced_success`. That is, 
when the workflow is stopped, if it contains a node that is forced to success, 
then the execution of the workflow can be resumed.
   
   # Implementation of the feature
   
   A new interface has been added, and the original two interfaces have been 
modified. For the interface documentation, see 
https://dsfts.w.eolinker.com/#/share/index?shareCode=GmFXMM
   
   1. **Added an interface to modify the task status. For task instances of 
typeIsFailure, you can modify the status to FORCED_SUCCESS**
   
   Specifically, directly modify the status of the failed task instance at the 
API-SERVER layer, and the log is recorded in the API-SERVER. With this as the 
front-end, there are two situations in the background:
   
   - If the workflow instance to which the task instance belongs is in a 
stopped state**:
   
     You can continue to run down from the node where the forcing succeeded, 
see section "2" below.
   
   - If the corresponding workflow instance **is running**:
   
     masterExecThread will continue to check in the database whether any failed 
tasks in the completeTaskList have been forced to succeed, and once detected, 
continue to sumbit the node after the task instance; **For tasks with failed 
retry**, if the task is currently in progress Within the retry interval, and 
then the user forces the last task instance to succeed, then the subsequent 
retry will not continue, and the subsequent node will be submitted instead.
   
   2. **Modified the execute interface of the original process, and added a new 
commandType. This operation can be triggered when the workflow fails and 
contains valid task instances that are forced to succeed.**
   
   Specifically, after the operation is triggered, all valid task instances in 
the previous execution process will be loaded, the entire processInstanceJson 
will be constructed into a dag, and the subsequent nodes that are forced to 
succeed can continue to be executed. Nodes such as sub-process and condition 
can also be supported.
   
   The final state of the process is like this, for example:
   
      - A -> B -> C, A is successful, and B is forced to succeed. At this time, 
if C is executed successfully, the status of processInstance is success
      - A -> B -> C, at the same time A -> D -> E, A succeeds, B is forced to 
succeed, and D also fails. At this time, the operation will only trigger the 
operation of C. Even if the execution of C succeeds, the entire The status of 
processInstance is still failure, because E actually failed to get running. The 
user can also choose **recovery failed** or **after forcing D to succeed, then 
trigger to continue running**.
   
   3. **Modified the return value of taskStateCount in the original 
dataAnalysis. This interface is to count the number of tasks in various states 
under a project**. Because there is extra ***forced success*** status, the 
return value of this interface has been modified.
   
   ---------
   
   # 新特性描述
   
   总的而言,强制成功是分为了两个子功能:
   
   1. 
增加一个新的任务状态,叫`强制成功过(forced_success)`。对于工作流中失败的任务,可以手动修改其状态为`强制成功过`,若工作流仍在运行中,那么后续依赖可以继续执行。
   2. 
增加一个新的执行命令,叫`从强制成功过的节点恢复运行(resume_from_forced_success)`。即当工作流停止时,若其中包含被强制成功的节点,那么可以恢复工作流的执行。
   
   # 具体实现
   
   
新增了一个接口,修改了原有的两个接口,接口文档见https://dsfts.w.eolinker.com/#/share/index?shareCode=GmFXMM
   
   1. **增加了一个修改任务状态的接口,对于typeIsFailure的任务实例,可以修改其状态为强制成功过(FORCED_SUCCESS)**
   
   具体而言,在**API-SERVER**层直接修改失败任务实例的状态,日志记录在API-SERVER中。以此为前置,后台有以下两种情况:
   
   - 若该任务实例所属的工作流实例**处于停止的状态**:
   
     可以从强制成功的结点继续往下运行,见下文“2”部分。
   
   - 若对应的工作流实例**正在运行**:
   
     
masterExecThread会不断去数据库中检查completeTaskList中是否有失败的任务被强制成功了,一旦检测到就继续sumbit这个任务实例之后的结点;**对于有失败重试的任务**,如果现在正在该任务的重试间隔内,然后用户把上一次的任务实例强制成功了,那么接下来也不会继续重试了,转而submit后续结点。
   
   2. 
**修改了原有的process的execute的接口,新增一种commandType,当工作流是失败的,且其中包含有效的被强制成功的任务实例时,可以触发此操作。**
   
   
具体而言,触发操作后,会载入之前执行过程中所有有效的任务实例,将整个processInstanceJson构建成dag,然后被强制成功的节点的后续就能继续得到执行了。对于sub-process和condition这样的节点也能得到支持。
   
   最终的process的状态是这样的,比如:
   
      - A -> B -> C,A成功了,B被强制成功了,这时候C如果被执行成功了,processInstance的状态就是success
      - A -> B -> C,同时A -> D -> 
E,A成功了,B被强制成功,D也失败了,这时候操作也只会触发C的运行,即使C执行成功了,整个processInstance的状态还是failure,因为E其实没能得到运行。用户还可以选择**恢复失败**也可以**将D强制成功后,再触发继续运行**。
   
   3. 
**修改了原有dataAnalysis中taskStateCount接口的返回值,这个接口就是统计某个project下面各种state的task数目分别有多少**。因为多了***强制成功过***这种状态,所以该接口的返回值进行了修改。
   
   除了上述而外,还修改了一些新增commandType会影响到的类。


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to