vanilla111 opened a new pull request #3421: URL: https://github.com/apache/incubator-dolphinscheduler/pull/3421
## What is the purpose of the pull request - Added a function to delay the execution of tasks; - Add corresponding unit test; - Added statistics on delayed execution of tasks; - Change the time point when the start time of various tasks is set; - Fix the bug that `switch...case` statement in `NettyDecoder.java` lacks `break`; - Fix the error that `@ContextConfiguration` lacks `CuratorZookeeperClient.class` in multiple test files; - The `getRemainTime` method is added to `DateUtils`, and the code for calculating the remaining time is changed to use the new tool method. ## Brief change log - `dolphinscheduler-api/src/main/java/org/apache/dolphinscheduler/api/dto/TaskCountDto.java` added statistics of delayed execution tasks and fix a spelling error; - `dolphinscheduler-common/src/main/java/org/apache/dolphinscheduler/common/enums/ExecutionStatus.java` added delayed execution status; - `dolphinscheduler-common/src/main/java/org/apache/dolphinscheduler/common/model/TaskNode.java` added a delay attribute `delayTime`; - `dolphinscheduler-dao/src/main/java/org/apache/dolphinscheduler/dao/entity/TaskInstance.java` added delay property `delayTime` and first submission time `firstSubmitTime`; - Fix the bug that the decode method lacks break in `dolphinscheduler-remote/src/main/java/org/apache/dolphinscheduler/remote/codec/NettyDecoder.java`; - `dolphinscheduler-server/src/main/java/org/apache/dolphinscheduler/server/builder/TaskExecutionContextBuilder.java` added delay property `delayTime` and current task status property `currentExecutionStatus`; - Add unit test `dolphinscheduler-server/src/test/java/org/apache/dolphinscheduler/server/worker/runner/TaskExecuteThreadTest.java `; - The `createTaskInstance` method in `dolphinscheduler-serve/src/main/java/org/apache/dolphinscheduler/server/master/runner/MasterExecThread.java` removes the setting of the start time; - Condition node execution thread and dependent node execution thread independently set the start time; - Added judgments on the delayed execution status of tasks `DELAY_EXECUTION` in multiple files; - `dolphinscheduler-serve/src/main/java/org/apache/dolphinscheduler/server/worker/runner/TaskExecuteThread.java` added delayed execution code; - Add `CuratorZookeeperClient.class` to `@ContextConfiguration` in multiple test files; - Add fields `delay_time` and `first_submit_time` in the database `t_ds_task_instance` table, and modify the corresponding sql file; - IDE tool automatically optimizes import (my setting is that the import of less than 15 of the same package does not use `*`, and the import of JDK is at the top of the file) ## Verify this pull request - All specified unit tests pass; - Passed the black box test; Here are the details of the black box test: ### test environment - Operating system: macOS - Number of Master Nodes: 2 - Number of Worker Nodes: 2 - Worker execution task type: Python ### Test details The following test temporarily modified the judgment of no failover under the same HOST. The node crashes all use the `-9` semaphore. Note: If a task needs to be executed later, the WORKER will change the status of the task to `DELAY_EXECUTION` when sending ACK for the first time; after the delay ends, before starting execution, it will send a second ACK to change the status of the task to `RUNNING_EXECUTION` and set the start time. There is only one ACK for tasks that do not need to be delayed. ##### Simple single task Test case | Master status | Worker status | Expected result | Test result | Description :-: | :-: | :-: | :-: | :-: | :-: Dependent node | Normal | Normal | The dependency is handled correctly, the start and end time are correct | As expected Condition node | Normal | Normal | The condition relationship is processed correctly, the start and end time are correct | As expected Subprocess node | Normal | Normal | Subprocess execution is Normal, start and end time are correct | As expected Python node | Normal | Normal | execute the task correctly, start and end time is correct | As expected #### Task delay execution Test case | Master status | Worker status | Expected result | Test result | Description :-: | :-: | :-: | :-: | :-: | :-: Python node, delay execution for one minute, and the task content is sleep for one minute | Normal | Normal | execute the task correctly, query the task status at each time point in accordance with the real situation | As expected | SUMITED\_SUCCESS -> DELAY\_EXECUTION -> RUNNING\_EXECUTION -> SUCCESS status changed correctly #### WORKER node crash test Test case | Master status | Worker status | Expected result | Test result | Description :-: | :-: | :-: | :-: | :-: | :-: Python node, delay execution for one minute, and the task content is sleep for one minute | Normal | The node receiving the task crashes before the first ACK | The host is empty, it will not trigger the failover | As expected | At this point Restart the Master, the task is resubmitted Same as above | Normal | The node that receives the task crashes before the second ACK | Triggers failover | As expected | The master submits a new same task Same as above | Normal | The node that receives the task crashes while executing the task | Triggers failover | As expected | The master submits a new same task #### MASTER node crash test Test case | Master status | Worker status | Expected result | Test result | Description :-: | :-: | :-: | :-: | :-: | :-: Python node, delay execution for one minute, and the task content is sleep for one minute | The node dispatching the task crashes before the first ACK | Normal | Another node takes over the task until the task is completed | As expected | Same as above | The node that dispatches the task crashes before the second ACK | Normal | Another node takes over the task | As expected | Same as above | The node that dispatches the task crashes during the execution of the task | Normal | Another node takes over the task | As expected | In the above test, the various times of the workflow and task instances are set at the correct time. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
