internetcoffeephone commented on issue #41123:
URL: https://github.com/apache/airflow/issues/41123#issuecomment-2286514059
@EvertonSA You are right, I included these for completeness.
I found another clue: the second (erroneous) execution of the task always
happens ~30 seconds later, never off by more than 1 second.
E.g.
```
[2024-08-13, 03:55:37 CEST] {{taskinstance.py:2077}} INFO - Dependencies all
met for dep_context=non-requeueable deps ti=<TaskInstance:
test_dag.check_basics scheduled__2024-08-12T00:00:00+00:00 [queued]>
...
[2024-08-13, 03:56:06 CEST] {{taskinstance.py:2067}} INFO - Dependencies not
met for <TaskInstance: test_dag.check_basics
scheduled__2024-08-12T00:00:00+00:00 [running]>, dependency 'Task Instance
State' FAILED: Task is in the 'running' state.
```
This gives us a clue as to which config settings may be relevant here, but
the only relevant one I was able to find was:
`min_serialized_dag_update_interval = 30`
I lack understanding of the interaction between Airflow/Celery to know where
exactly I need to look - either Airflow is
scheduling or picking up tasks that it shouldn't, or Celery is not correctly
communicating running task state. Any pointers on where in the Airflow code
this process happens would be appreciated.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]