vanilla111 opened a new issue #3352: URL: https://github.com/apache/incubator-dolphinscheduler/issues/3352
I wrote a function to delay the execution of the task, but in my discussion with my mentor, there was a disagreement on how to assign the start time in the task instance. ## Problem Description In the original design, the start time of the task instance will be initialized or changed in several places: - In the `createTaskInstance()` method in `dolphinscheduler-server/.../server/master/runner/MasterExecThread.java` - `dolphinscheduler-server/.../server/master/runner/SubProcessTaskExecThread.java` will update the start time once the task is submitted - If a failed task tries to retry, it will be updated in the `submitTaskInstanecToDB()` method in `dolphinscheduler-service/.../service/process/ProcessService.java` - After the worker is dispatched, an ACK is sent in the process() method in `dolphinscheduler-server/.../server/worker/processor/TaskExecuteProcessor.java` to update the start time Most tasks (such as tasks that are dispatched to workers) have two start times from the creation of an instance to the completion of execution. Let us briefly analyze what problems will arise in this way. For example, there is a function to view the running time of a task that uses this start time, and then a task is created at 10:00, and a worker receives the task at 10:05. According to the original design, I can see that the task has been running for 4 minutes at 10:04, but I will see that the task has been running for 1 minute at 10:06 (because the worker will send an ACK to update the start time at 10:05), which is obviously unreasonable. Let me first define a few key time points (condition nodes and dependent nodes are considered separately): 1. Time when the master created the task instance 2. Worker receiving task instance time 3. Worker delay execution start time 4. Time when the worker starts to execute the task 5. Worker execution task completion time Then define the time between each time point: - 1->2: Waiting time in the queue (including submission to the database, removal from the database, submission to the task queue, and removal from the task queue) - 3->4: Delay execution time - 4->5: Task execution time ## My mentor's point of view He defines startTime in TaskInstance as time point 4. Before this time point, this value should not be `null`, and thinks that it is better to initialize to `new Date()`. At the same time, he believes that the original design should be maintained, and the value can be updated multiple times if necessary. ## My point of view I also defined the startTime in TaskInstance as time point 4, but I think this value should remain `null` before this time point, and this value is assigned only once in the life cycle of a task instance. Will no longer be updated afterwards. My reasons are as follows: - A variable should not be assigned before it meets its definition; - Multiple updates will lead to inconsistencies in reading data multiple times, such as the example of checking the running time of the task mentioned earlier; - There is a member submitTime in TaskInstance. In the original design, its value is very close to startTime. When the system load is small, they are almost always the same. So where you need to use before startTime initialization, you can use submitTime instead; - However, if the startTime before initialization needs to be used somewhere in the code, then whether this time should be called startTime is open to question; - Use IDEA for code analysis, startTime is rarely used. In addition, I found that in the code for viewing the Gantt chart, there is a line of code that is written like this: ``` Date startTime = taskInstance.getStartTime() == null? New Date(): taskInstance.getStartTime(); ``` It is considered here that startTime can be `null`, which supports my point from the side. The above questions can be summed up as follows: The original design of the system is ambiguous in the definition of the task start time. ## Impact of improvement The code of startTime that has not been initialized may be used. I found two places: 1. `checkTaskAfterWorkerStart()` in `dolphinscheduler-server/../server/zk/ZKMasterClient.java` 2. `getRemaintime()` in `dolphinscheduler-server/.../server/master/runner/MasterTaskExecThread.java` For the first part, since I am not familiar with this part of the code, I don't know whether adding a judgment can complete the original logic. But for the second part, since the worker may crash at any time between time 2 and 5, if the startTime is initialized without sending an ACK before the crash, the master will not be able to determine whether it has timed out and take corresponding actions. I do not have a good solution to this problem for the time being, and welcome your comments. ## At last For the startTime of taskInstance, how do you think it is better to initialize? Do I need to make improvements based on my point of view? Or whether to keep the original design? Or have other suggestions? ### PS: The same problem exists for the endTime of taskInstance. After the worker finishes executing the task, the sendResult will update this time, but the master will also update this time after waitTaskQuit. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
