Re: [PR] Respect task retries for signal killed tasks [airflow]

via GitHub Wed, 17 Sep 2025 12:18:34 -0700


kaxil commented on PR #55767:
URL: https://github.com/apache/airflow/pull/55767#issuecomment-3304266244


   >When tasks are killed by system signals (SIGKILL for OOM, SIGTERM for 
worker restarts), they immediately go to FAILED state instead of respecting the 
task retries set and going to UP_FOR_RETRY state. This creates unexpected 
behavior where exception based failures respect retries but signal based 
failures don't.
   
   How common is this scenario (excluding manually killing task process)? Since 
the supervisor and task processes are running in the same container, wouldn't 
an OOM condition typically kill the entire container rather than just the 
individual task process?
   
   In the more common case where the entire container gets OOM-killed:
   1) The supervisor process would also die
   2) Heartbeat to the scheduler would fail
   3) Scheduler would receive a FAILED executor event and handle retries 
through the normal `process_executor_events()` → `handle_failure()` path


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Respect task retries for signal killed tasks [airflow]

Reply via email to