potiuk commented on issue #57174:
URL: https://github.com/apache/airflow/issues/57174#issuecomment-3446710082

   Corrrect - currently timeout is handled in the task itself via SIGALRM - and 
if another signal (SIGSEGV) comes it might get task into a hanging state. This 
happens usually when you have native code that runs in a long tight loop and 
does not handle signals in Python. We've discussed it in the past and solution 
to that is to handle the timeoit in supervisor task (which is in another, 
parent process - tasks are in forked processes).
   
   In order to communicate that we have to make a new task -> supervisor API 
(to send the timeout information after parsing the DAG, because the supervisor 
does not have that information as it d does not parse the task).  This would 
generally handle all the possible cases where the task hangs - including 
possible escalation of signals to kill such forked tasks (SIGTERM followed by 
SIGKILL after a short additional timeout if the task does not exit).
   
   That's all possible and if someone would like to take on that task - it's 
not even that difficult. I marked it as a good-first issue.
   
   cc: @ashb @amoghrajesh if you have something to add.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to