eskarimov commented on a change in pull request #20062:
URL: https://github.com/apache/airflow/pull/20062#discussion_r764726355



##########
File path: airflow/models/taskinstance.py
##########
@@ -1200,7 +1200,8 @@ def check_and_change_state_before_execution(
             # Attempt 0 for the first attempt).
             # Set the task start date. In case it was re-scheduled use the 
initial
             # start date that is recorded in task_reschedule table
-            self.start_date = timezone.utcnow()
+            # If the task continues after being deferred (next_method is set), 
use the original start_date
+            self.start_date = self.start_date if self.next_method else 
timezone.utcnow()

Review comment:
       I'm a bit confused here now:
   - There's no other indicator aside `TaskInstance.next_method` to use for 
identifying if the task was returned after deferral.
   - A task instance is refreshed from DB each time it starts. It happens in 
function `check_and_change_state_before_execution()`
   
https://github.com/apache/airflow/blob/1349cc62df9fa872fd24865ade7b281bb229d70c/airflow/models/taskinstance.py#L1168-L1174
   where it got `start_date` assigned from the previous run.
   If we want to use the snippet from above with `if self.start_date is None:`, 
then `start_date` needs to be `None` in DB at the moment of refresh, but it 
doesn't feel right to set it to `None` in DB before the new task instance has 
actually started executing.
   - `TaskInstance.set_state()` isn't used inside `TaskInstance` class, I've 
checked for usages, used in different places like `airflow.jobs`, 
`airflow.executors.debug_executor`, etc.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to