mykola-shyshov opened a new pull request, #58896: URL: https://github.com/apache/airflow/pull/58896
# Fix: Prevent duplicate task execution on scheduler crash (Celery executor) ## Problem Tasks can be executed twice when the scheduler crashes between sending a task to Celery and persisting the `external_executor_id` in the database. This happens because: 1. Task is sent to Celery → Celery generates `task_id` 2. Task starts running on worker → transitions to `RUNNING` state 3. **Crash window**: Scheduler hasn't yet processed events to set `external_executor_id` 4. Scheduler restarts → can't adopt task (no executor ID) → resets task → duplicate execution Fixes #58570 ## Solution Use the existing TaskInstance UUID (`ti.id`) as the `external_executor_id` by setting it **before** sending the task to Celery, eliminating the race condition. ## Backward Compatibility Fully backward compatible: - Event buffer still used as fallback for old workers - Only sets `external_executor_id` if session is available -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
