rjh-yext opened a new issue, #59378:
URL: https://github.com/apache/airflow/issues/59378

   ### Apache Airflow version
   
   Other Airflow 2/3 version (please specify below)
   
   ### If "Other Airflow 2/3 version" selected, which one?
   
   apache/airflow:3.1.3 Docker image
   
   ### What happened?
   
   We're seeing an issue where a task is started, stopped b/c Airflow thinks it 
should be not be running, then attempts multiple restarts of the task. This 
results in the task starting execution multiple times, but it appears that 
Airflow loses track of (or ignores) the execution result. Note that the 
requests to restart occur within (milli)seconds of the task first starting. In 
some cases, there are several retries (11x), and the dag is marked as `Failed` 
though the offending tasks are marked as `Skipped`, when they clearly have been 
attempted multiple times.
   
   Our deployment of Airflow has two instances of the Scheduler running, and 
we've seen this error occur both when the task is re/started from the same 
instance, and when it has been re/started from different instances of the 
Scheduler.
   
   One example of the sequence of Scheduler log entries are as follows. There 
does not appear to be any other relevant or associated logs within the time 
frame, but I can provide any further logs if requested. In this case, there are 
no indications of any error or restart of the tasks in the Dag run logs.
   
   `[2025-12-08T08:00:01.522+0000] {{_client.py:1026}} INFO - HTTP Request: 
PATCH 
http://myinstance/execution/task-instances/019afcf9-7ee5-713b-be96-758c026e7d15/run
 "HTTP/1.1 200 OK"`
   
   `2025-12-08 08:00:01 [debug    ] Sending                        [supervisor] 
msg=StartupDetails(ti=TaskInstance(id=UUID('019afcf9-7ee5-713b-be96-758c026e7d15'),
 task_id='MyTask', dag_id='MyDag', 
run_id='scheduled__2025-12-07T08:00:00+00:00', try_number=1, ... `
   
   `[2025-12-08T08:01:22.494+0000] {{_client.py:1026}} INFO - HTTP Request: 
PATCH 
http://myinstance/execution/task-instances/019afcf9-7ee5-713b-be96-758c026e7d15/run
 "HTTP/1.1 200 OK"`
   
   `2025-12-08 08:01:22 [debug    ] Sending                        [supervisor] 
msg=StartupDetails(ti=TaskInstance(id=UUID('019afcf9-7ee5-713b-be96-758c026e7d15'),
 task_id='MyTask', dag_id='MyDag', 
run_id='scheduled__2025-12-07T08:00:00+00:00', try_number=2`
   
   `[2025-12-08T08:01:22.239+0000] {{_client.py:1026}} INFO - HTTP Request: PUT 
http://myinstance/execution/task-instances/019afcf9-7ee5-713b-be96-758c026e7d15/heartbeat
 "HTTP/1.1 409 Conflict"`
   
   `2025-12-08 08:01:22 [error    ] Server indicated the task shouldn't be 
running anymore [supervisor] detail={'detail': {'reason': 'not_running', 
'message': 'TI is no longer in the running state and task should terminate', 
'current_state': 'scheduled'}} status_code=409 
ti_id=UUID('019afcf9-7ee5-713b-be96-758c026e7d15')`
   
   `[2025-12-08T08:01:22.642+0000] {{_client.py:1026}} INFO - HTTP Request: PUT 
http://airflowwebserver.service.nj1.consul:6002/execution/task-instances/019afcf9-7ee5-713b-be96-758c026e7d15/rtif
 "HTTP/1.1 201 Created"`
   
   
   Occasionally, the dag logs will output something like the following before 
restarting the task:
   
   `2025-12-08 01:18:04.771 | Server indicated the task shouldn't be running 
anymore. Terminating process`
   
   `2025-12-08 01:18:04.771 | Task killed!`
   
   
   
   ### What you think should happen instead?
   
   _No response_
   
   ### How to reproduce
   
   Seems to occur sporadically, and not in any consistent manner. The dags with 
which this occurs also varies.
   
   ### Operating System
   
   Debian GNU/Linux 12 (bookworm)
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-fab == 3.0.2
   apache-airflow-providers-google == 15.1.0
   apache-airflow-providers-slack == 9.5.0
   apache-airflow-providers-standard == 1.9.0
   
   ### Deployment
   
   Other Docker-based deployment
   
   ### Deployment details
   
   There are two instances of Airflow Scheduler deployed
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to