mykola-shyshov opened a new pull request, #58896:
URL: https://github.com/apache/airflow/pull/58896

   # Fix: Prevent duplicate task execution on scheduler crash (Celery executor)
   
   ## Problem
   
   Tasks can be executed twice when the scheduler crashes between sending a 
task to Celery and persisting the `external_executor_id` in the database. This 
happens because:
   
   1. Task is sent to Celery → Celery generates `task_id`
   2. Task starts running on worker → transitions to `RUNNING` state
   3. **Crash window**: Scheduler hasn't yet processed events to set 
`external_executor_id`
   4. Scheduler restarts → can't adopt task (no executor ID) → resets task → 
duplicate execution
   
   Fixes #58570
   
   ## Solution
   
   Use the existing TaskInstance UUID (`ti.id`) as the `external_executor_id` 
by setting it **before** sending the task to Celery, eliminating the race 
condition.
   
   ## Backward Compatibility
   
   Fully backward compatible:
   - Event buffer still used as fallback for old workers
   - Only sets `external_executor_id` if session is available
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to