potiuk commented on PR #61627:
URL: https://github.com/apache/airflow/pull/61627#issuecomment-3905278479

   I would really like @ashb @kaxil and @amoghrajesh to review this and figure 
out what is the desired signal handling in this case. 
   
   It might have quite profound impact on how our signal handling is 
implemented in Airlfow. 
   
   So far, we assumed that when we run any component except `celery worker` we 
relied on signal propagation to all processes in process group via `dumb-init` 
entrypoint process - and all our images had entrypoint (thus iniit process) set 
as `dumb-init`. 
   
   Our images have `dumb-init` set at entry point and default behaviour is that 
signals received by dumb-init are forwarded to all the processes in the main 
proces group. This means that if the proces does not create a new process group 
when it forks it's children, all such created processes will receive the signal.
   
   This was a deliberate decision, because a lot of potential tasks and parsing 
process libraries might not properly propagate signals - typical example is 
when we run bash tasks, bash will silently swallow signals and will not 
propagate it to it's children processes unless specific handlers are configured 
in the bash script. A lot of people does not know that it happens and they are 
starting bash subprocesses, not knowing that signals will not be propagated - 
and since airflow is pretty versatile and you can generally write any code, we 
decided to send signals to all the precesses in the group (which is defailt 
dumb-init behaviour). 
   
   The only exception is celery worker, because celery worker has a deliberate 
handling of the signals - the first signal they receive puts the worker in 
offline mode, where it does not accept any new tasks - and waits for all tasks 
to complete. This is then used by KEDA autoscaler for example to automatically 
scale celery workers down. Celery handles propagating of signals to it's 
subprocesses, including signal escalation in case the signal is not propagated 
- with timeuts, so even if the bash subprocess swallows the signal, celery will 
escalate eventually to SIGKILL and kill such tasks (but if tasks are handling 
signals, it will nicely and cleanly stop the before escalation).
   
   So instead of sending signals to process group, celery is configured to only 
receive signals by the master worker and it will take care of the propagation - 
DUMB_INIT_SETSID  should be set to 0 for celery pods. And this is how it is 
implemented in our helm chart as well, and documentation of Airflow Docker 
image describes this behaviour:
   
   
https://airflow.apache.org/docs/docker-stack/entrypoint.html#signal-propagation 
   
   I am not sure what was the thought about supervisor's behaviour when it was 
implemented, whether it handles propagation of signals to child processes or 
whether it relies on process group receiving the signals, but I guess the 
latter. It would however be great to document the behaviour and expectations 
for signal handling in this case - because 
   
   I see two assumptions here that are wrong:
   
   1) The supervisor is assumed to be init process  - which I think is pretty 
wrong.  Supervisor in python should never be an init process  - unless it 
implements all that the init process should do. Init process should do more 
than signal propagation, it should also reap orphan processes by other 
subprocesses (so called zombies), it should wait for all of the processes it 
sent signals to before exiting - otherwise it wil not cleanly by the kernel - 
which might mean filesystem corruption for example. This is why all docker 
containers (that includes airflow reference image) should have a "real" init 
process as entrypoint - in our case this is `dumb-init` and it's absolutely not 
recommended by any production image to have `init` process that is not a `real` 
init process (some people set bash interpreter as entrypoint and it's 
absolutely wrong).
   
   2) the assumption is that the init process does not forward the signal to 
the whole process group of forked command (which happens in Airflow reference 
image) .
    
   I think before we continue with this PR we should at the very least make 
some assumptions here set and agreed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to