ephraimbuddy commented on issue #57618:
URL: https://github.com/apache/airflow/issues/57618#issuecomment-3729237417

   I have created another PR to deal specifically with the log you got. Can you 
try it: https://github.com/apache/airflow/pull/60330. If it fixes it, then if 
there's 3.1.6rc2 we will include it. 
   
   From the log scheduler_race_condition_example.json:
   
   - 01:21:46 Scheduler pod A enqueues the TI as try 1:
     "2026-01-09T01:21:46.380591Z [info     ] Add task 
TaskInstanceKey(dag_id='sadp_dag_196', task_id='task_1',...
   
   - 01:22:54 Scheduler pod B enqueues the same TI again as try 2:
     "2026-01-09T01:22:54.802854Z [info     ] Add task 
TaskInstanceKey(dag_id='sadp_dag_196', task_id='task_1', ...
   
   This second enqueue happens before any worker starts (the first worker 
“Executing workload” line is 01:24:48), so the try bump is happening purely in 
scheduler logic.
   
   Then:
   
   -  01:24:48 → 01:25:03 the worker runs try 2 and the Execution API marks the 
TI success.
   
   - 01:26:30 a worker pod later starts try 1, but the API rejects it:
         Cannot start Task Instance in invalid state ... previous_state=success
         /run ... 409 Conflict
   
   This happened because DagRun.schedule_tis() updated rows using only 
TI.id.in_(...). In HA, scheduler B can have a stale in-memory view that still 
treats the TI as schedulable, run schedule_tis(), and increment try_number even 
if scheduler A has already advanced the TI.
   
   I reproduced it in unit tests. However, please try it in your deployment.
   
   > how should I check it?
   
   The case you saw is different from adoption so it won't record in TI 
history. The adoption case which was fixed happens only when a scheduler is 
marked failed and another scheduler unable to adopt the task, resets it. If you 
want to check if there are cases like that in your deployment, you can `select 
* from task_instance_history where dag_id=...`
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to