potiuk commented on PR #39543:
URL: https://github.com/apache/airflow/pull/39543#issuecomment-2119277981
> @RNHTTR good recommendation. What if we document that better, or do
something even better. In the on_kill callback for the baseoperator, we can add
enough information for the users to send them in various debugging paths. TLDR;
Add all the possible causes we can think of there. Since on_kill callback will
only be called in case of, well kill.
>
> ```
> def on_kill(self):
> self.log.info("SIGKILL was called. It could be because of:
a)...b)....")
> ```
SIGKILL will ever trigger the `on_kill`. The `-9` signal is not possible to
handle really in "on_kill" - this is why we are guessing here why processes
were killed. The "on_kill" method name has really no relation to SIGKILL (-9) -
it's called when the task was stopped more gracefully rather by -9.
I think the right approach is to explain more what happens - current
description is rather vague. Here that the task process was killed externally
by -9, and have possible reasons why it might happen. OOM is one of the
reasons, but there are other reasons - for example when machine/pod is evicted,
-9 might be sent to all the processes when they are not responsive to other
attempts to kill. I think it would be great maybe to get a little more
description on all that and give the user some direction to look for - usually
it's a signal sent by the deployment (K8S) but likely there might be other
reasons - I think also Airflow standard task runner heartbeat might actually
sigkill such process if it becomes unresponsive (and likely there is another
log written in this case somewhere) - it would be worth to check it. So, just a
few things listed here as possible reasons (and making sure it is open-ended)
could be useful. Maybe even somewhere in our FAQ we should have a section "why
my t
ask can get sig-killed" and do a bit more description there.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]