potiuk commented on PR #39543:
URL: https://github.com/apache/airflow/pull/39543#issuecomment-2119277981

   > @RNHTTR good recommendation. What if we document that better, or do 
something even better. In the on_kill callback for the baseoperator, we can add 
enough information for the users to send them in various debugging paths. TLDR; 
Add all the possible causes we can think of there. Since on_kill callback will 
only be called in case of, well kill.
   > 
   > ```
   >     def on_kill(self):
   >         self.log.info("SIGKILL was called. It could be because of: 
a)...b)....")
   > ```
   
   SIGKILL will ever trigger the `on_kill`. The `-9` signal is not possible to 
handle really in "on_kill" - this is why we are guessing here why processes 
were killed. The "on_kill" method name has really no relation to SIGKILL (-9) - 
it's called when the task was stopped more gracefully rather by -9.
   
   I think the right approach is to explain more what happens - current 
description is rather vague. Here that the task process was killed externally 
by -9, and have possible reasons why it might happen. OOM is one of the 
reasons, but there are other reasons - for example when machine/pod is evicted, 
-9 might be sent to all the processes when they are not responsive to other 
attempts to kill. I think it would be great maybe to get a little more 
description on all that and give the user some direction to look for - usually 
it's a signal sent by the deployment (K8S) but likely there might be other 
reasons - I think also Airflow standard task runner heartbeat might actually 
sigkill such process if it becomes unresponsive (and likely there is another 
log written in this case somewhere) - it would be worth to check it. So, just a 
few things listed here as possible reasons (and making sure it is open-ended) 
could be useful. Maybe even somewhere in our FAQ we should have a section "why 
my t
 ask can get sig-killed" and do a bit more description there.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to