GitHub user hezeclark added a comment to the discussion: Race condition in CeleryExecutor with multiple schedulers causing duplicate TaskInstance execution
Duplicate TaskInstance execution with multiple schedulers and CeleryExecutor is a known race condition related to the scheduler heartbeat and task claiming mechanism. The root causes: 1. Multiple schedulers racing to queue the same TaskInstance — the database-level locking in Airflow 3.x uses SELECT FOR UPDATE SKIP LOCKED but can still have race conditions under high load. 2. Celery at-least-once delivery (TASK_ACKS_LATE) causing redelivery if a worker acks late. Mitigations: 1. Enable executor-level deduplication — make sure all schedulers share the same database and the scheduler_heartbeat_sec is not too low: scheduler_heartbeat_sec = 5 # default, increase under load max_tis_per_query = 512 # reduce to lower contention 2. Use database-level advisory locks — Airflow 3 supports PostgreSQL advisory locks for scheduler coordination: [scheduler] use_row_level_locking = True 3. For Celery + Redis: set broker_transport_options correctly: [celery_broker_transport_options] visibility_timeout = 86400 # must be > your longest task 4. Consider switching to the Database Executor (Airflow 3.x) for multi-scheduler setups — it uses database-level SKIP LOCKED which is more reliable than Celery for deduplication. For immediate mitigation: reduce the number of schedulers to 2 and verify all schedulers are on the same database with high availability (primary + read replica is fine, but all schedulers must write to the same primary). GitHub link: https://github.com/apache/airflow/discussions/63249#discussioncomment-16112341 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
