GitHub user prayaagmahajan created a discussion: Race condition in CeleryExecutor with multiple schedulers causing duplicate TaskInstance execution
**What happened** We observed a race condition in Airflow 3.1.7 when running with: - 3 schedulers - 3 Celery nodes - Redis broker A single TaskInstance was executed by **two Celery workers simultaneously**, resulting in inconsistent scheduler state updates. **Specifically:** - two different external_executor_id values appeared for the same TaskInstance - the scheduler received executor events with different try_numbers - events were processed out of order, causing the task state to be corrupted **Example observed sequence:** ``` executor event → queued → try_number=1 executor event → failed → try_number=2 executor event → success → try_number=1 ``` This resulted in the scheduler logging: Executor reported state=failed but TaskInstance state=running and the DAG run ultimately failing. Scheduler log: ``` Received executor event with state queued for task instance ... try_number=1 Received executor event with state failed ... try_number=2 Received executor event with state success ... try_number=1 ``` Error: ``` Executor reported that the task instance finished with state failed, but the task instance's state attribute is running. ``` Celery worker logs show two separate executions: ``` external_executor_id=fc123 try_number=1 ``` and ``` external_executor_id=db234 try_number=2 ``` GitHub link: https://github.com/apache/airflow/discussions/63249 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
