eskarimov commented on a change in pull request #20062:
URL: https://github.com/apache/airflow/pull/20062#discussion_r764870343



##########
File path: airflow/models/taskinstance.py
##########
@@ -1200,7 +1200,8 @@ def check_and_change_state_before_execution(
             # Attempt 0 for the first attempt).
             # Set the task start date. In case it was re-scheduled use the 
initial
             # start date that is recorded in task_reschedule table
-            self.start_date = timezone.utcnow()
+            # If the task continues after being deferred (next_method is set), 
use the original start_date
+            self.start_date = self.start_date if self.next_method else 
timezone.utcnow()

Review comment:
       If `start_date` would be set to `None` when the scheduler goes to defer 
the task, then we won't be able to know what was the original `start_date`, 
i.e. when the task before deferral started.
   Because we want to count the overall execution time, including time before 
deferral + defer time.
   
   Sorry, I haven't explained it correctly what I meant about clearing 
`start_date` in database: 
   it's getting tricky, when the task is up for retry.
   In case of the second and further attempts `refresh_from_db()` sets 
`start_date` equal to the previous try attempt, i.e. it'd be `not None`.
   Which means if we use the snippet above
   ```python
   if self.start_date is None:
       self.start_date = timezone.utcnow()
   ```
   then in case of re-try, the task would continue execution with the 
`start_date` from the previous attempt, which is wrong.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to