vanilla111 opened a new issue #3352:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/3352


   I wrote a function to delay the execution of the task, but in my discussion 
with my mentor, there was a disagreement on how to assign the start time in the 
task instance.
   
   
   ## Problem Description
   
   In the original design, the start time of the task instance will be 
initialized or changed in several places:
   
   - In the `createTaskInstance()` method in 
`dolphinscheduler-server/.../server/master/runner/MasterExecThread.java`
   - 
`dolphinscheduler-server/.../server/master/runner/SubProcessTaskExecThread.java`
 will update the start time once the task is submitted
   - If a failed task tries to retry, it will be updated in the 
`submitTaskInstanecToDB()` method in 
`dolphinscheduler-service/.../service/process/ProcessService.java`
   - After the worker is dispatched, an ACK is sent in the process() method in 
`dolphinscheduler-server/.../server/worker/processor/TaskExecuteProcessor.java` 
to update the start time
   
   Most tasks (such as tasks that are dispatched to workers) have two start 
times from the creation of an instance to the completion of execution. Let us 
briefly analyze what problems will arise in this way. For example, there is a 
function to view the running time of a task that uses this start time, and then 
a task is created at 10:00, and a worker receives the task at 10:05. According 
to the original design, I can see that the task has been running for 4 minutes 
at 10:04, but I will see that the task has been running for 1 minute at 10:06 
(because the worker will send an ACK to update the start time at 10:05), which 
is obviously unreasonable.
   
   Let me first define a few key time points (condition nodes and dependent 
nodes are considered separately):
   
   1. Time when the master created the task instance
   2. Worker receiving task instance time
   3. Worker delay execution start time
   4. Time when the worker starts to execute the task
   5. Worker execution task completion time
   
   Then define the time between each time point:
   
   - 1->2: Waiting time in the queue (including submission to the database, 
removal from the database, submission to the task queue, and removal from the 
task queue)
   - 3->4: Delay execution time
   - 4->5: Task execution time
   
   ## My mentor's point of view
   
   He defines startTime in TaskInstance as time point 4. Before this time 
point, this value should not be `null`, and thinks that it is better to 
initialize to `new Date()`. At the same time, he believes that the original 
design should be maintained, and the value can be updated multiple times if 
necessary.
   
   ## My point of view
   
   I also defined the startTime in TaskInstance as time point 4, but I think 
this value should remain `null` before this time point, and this value is 
assigned only once in the life cycle of a task instance. Will no longer be 
updated afterwards.
   
   My reasons are as follows:
   
   - A variable should not be assigned before it meets its definition;
   - Multiple updates will lead to inconsistencies in reading data multiple 
times, such as the example of checking the running time of the task mentioned 
earlier;
   - There is a member submitTime in TaskInstance. In the original design, its 
value is very close to startTime. When the system load is small, they are 
almost always the same. So where you need to use before startTime 
initialization, you can use submitTime instead;
   - However, if the startTime before initialization needs to be used somewhere 
in the code, then whether this time should be called startTime is open to 
question;
   - Use IDEA for code analysis, startTime is rarely used.
   
   In addition, I found that in the code for viewing the Gantt chart, there is 
a line of code that is written like this:
   
   ```
   Date startTime = taskInstance.getStartTime() == null? New Date(): 
taskInstance.getStartTime();
   ```
   
   It is considered here that startTime can be `null`, which supports my point 
from the side.
   
   The above questions can be summed up as follows: The original design of the 
system is ambiguous in the definition of the task start time.
   
   
   ## Impact of improvement
   
   The code of startTime that has not been initialized may be used. I found two 
places:
   
   1. `checkTaskAfterWorkerStart()` in 
`dolphinscheduler-server/../server/zk/ZKMasterClient.java`
   2. `getRemaintime()` in 
`dolphinscheduler-server/.../server/master/runner/MasterTaskExecThread.java`
   
   
   For the first part, since I am not familiar with this part of the code, I 
don't know whether adding a judgment can complete the original logic. But for 
the second part, since the worker may crash at any time between time 2 and 5, 
if the startTime is initialized without sending an ACK before the crash, the 
master will not be able to determine whether it has timed out and take 
corresponding actions. I do not have a good solution to this problem for the 
time being, and welcome your comments.
   
   ## At last
   
   For the startTime of taskInstance, how do you think it is better to 
initialize? Do I need to make improvements based on my point of view? Or 
whether to keep the original design? Or have other suggestions?
   
   ### PS:
   
   The same problem exists for the endTime of taskInstance. After the worker 
finishes executing the task, the sendResult will update this time, but the 
master will also update this time after waitTaskQuit.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to