Copilot commented on code in PR #64816:
URL: https://github.com/apache/airflow/pull/64816#discussion_r3066476376


##########
airflow-core/src/airflow/api_fastapi/execution_api/routes/task_instances.py:
##########
@@ -172,6 +172,20 @@ def ti_run(
     if ti.next_kwargs:
         data.pop("start_date")
         log.debug("Removed start_date from update as task is resuming from 
deferral")
+    elif "start_date" in data:
+        # Restore original start_date for rescheduled tasks. The supervisor 
always sends
+        # start_date=utcnow(), but for a sensor in reschedule mode the true 
start is when
+        # the first poke ran, recorded in TaskReschedule. Without this, every 
re-poke
+        # resets start_date and inflates dagrun.first_task_scheduling_delay.
+        first_reschedule_start_date = session.scalar(
+            select(TaskReschedule.start_date)
+            .where(TaskReschedule.ti_id == task_instance_id)
+            .order_by(TaskReschedule.id.asc())

Review Comment:
   This query is not scoped to the current `try_number`, so if the same 
`TaskInstance` has `TaskReschedule` rows from previous tries, it may restore a 
stale `start_date` from an earlier try. To fix, filter `TaskReschedule` by the 
current TI `try_number` (or reuse `TaskReschedule.stmt_for_task_instance(...)` 
which is already try-scoped), and consider ordering by `start_date` (with `id` 
as a tiebreaker) instead of `id` alone to ensure the earliest reschedule record 
is selected.
   ```suggestion
               .where(
                   TaskReschedule.ti_id == task_instance_id,
                   TaskReschedule.try_number == ti.try_number,
               )
               .order_by(TaskReschedule.start_date.asc(), 
TaskReschedule.id.asc())
   ```



##########
airflow-core/tests/unit/api_fastapi/execution_api/versions/head/test_task_instances.py:
##########
@@ -651,6 +651,63 @@ def test_next_kwargs_determines_start_date_update(self, 
client, session, create_
         ti = session.get(TaskInstance, ti.id)
         assert ti.start_date == expected_start_date
 
+    def test_ti_run_restores_start_date_for_rescheduled_task(
+        self, client, session, create_task_instance, time_machine
+    ):

Review Comment:
   The new behavior in `ti_run` should be validated for retry scenarios: if a 
TI gets a new try (incremented `try_number`) and reschedules again, 
`start_date` should be restored from the first `TaskReschedule` of the *current 
try*, not from prior tries. Adding a test that creates `TaskReschedule` rows 
for multiple tries (same `ti_id`) would prevent regressions and would have 
caught the missing `try_number` filter.



##########
airflow-core/src/airflow/api_fastapi/execution_api/routes/task_instances.py:
##########
@@ -172,6 +172,20 @@ def ti_run(
     if ti.next_kwargs:
         data.pop("start_date")
         log.debug("Removed start_date from update as task is resuming from 
deferral")
+    elif "start_date" in data:
+        # Restore original start_date for rescheduled tasks. The supervisor 
always sends
+        # start_date=utcnow(), but for a sensor in reschedule mode the true 
start is when
+        # the first poke ran, recorded in TaskReschedule. Without this, every 
re-poke
+        # resets start_date and inflates dagrun.first_task_scheduling_delay.
+        first_reschedule_start_date = session.scalar(
+            select(TaskReschedule.start_date)
+            .where(TaskReschedule.ti_id == task_instance_id)
+            .order_by(TaskReschedule.id.asc())
+            .limit(1)
+        )
+        if first_reschedule_start_date:

Review Comment:
   Use an explicit `is not None` check here instead of relying on truthiness, 
to make it clear that only the absence of a row should skip the override.
   ```suggestion
           if first_reschedule_start_date is not None:
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to