GitHub user prayaagmahajan created a discussion: Race condition in 
CeleryExecutor with multiple schedulers causing duplicate TaskInstance execution

**What happened**

We observed a race condition in Airflow 3.1.7 when running with:

  - 3 schedulers
  - 3 Celery nodes
  - Redis broker

A single TaskInstance was executed by **two Celery workers simultaneously**, 
resulting in inconsistent scheduler state updates.

**Specifically:**

- two different external_executor_id values appeared for the same TaskInstance

- the scheduler received executor events with different try_numbers

- events were processed out of order, causing the task state to be corrupted

**Example observed sequence:**
```
executor event → queued → try_number=1
executor event → failed → try_number=2
executor event → success → try_number=1
```

This resulted in the scheduler logging:

Executor reported state=failed but TaskInstance state=running

and the DAG run ultimately failing.

Scheduler log:
```
Received executor event with state queued for task instance ...
try_number=1

Received executor event with state failed ...
try_number=2

Received executor event with state success ...
try_number=1
```
Error:

```
Executor reported that the task instance finished with state failed,
but the task instance's state attribute is running.
```

Celery worker logs show two separate executions:

```
external_executor_id=fc123
try_number=1
```

and

```
external_executor_id=db234
try_number=2
```


GitHub link: https://github.com/apache/airflow/discussions/63249

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to