This is an automated email from the ASF dual-hosted git repository.
potiuk pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow.git
The following commit(s) were added to refs/heads/main by this push:
new 9e13e45 Fixes warm shutdown for celery worker. (#18068)
9e13e45 is described below
commit 9e13e450032f4c71c54d091e7f80fe685204b5b4
Author: Jarek Potiuk <[email protected]>
AuthorDate: Fri Sep 10 20:13:31 2021 +0200
Fixes warm shutdown for celery worker. (#18068)
The way how dumb-init propagated the signal by default
made celery worker not to handle termination well.
Default behaviour of dumb-init is to propagate signals to the
process group rather than to the single child it uses. This is
protective behaviour, in case a user runs 'bash -c' command
without 'exec' - in this case signals should be sent not only
to the bash but also to the process(es) it creates, otherwise
bash exits without propagating the signal and you need second
signal to kill all processes.
However some airflow processes (in particular airflow celery worker)
behave in a responsible way and handles the signals appropriately
- when the first signal is received, it will switch to offline
mode and let all workers terminate (until grace period expires
resulting in Warm Shutdown.
Therefore we can disable the protection of dumb-init and let it
propagate the signal to only the single child it spawns in the
Helm Chart. Documentation of the image was also updated to include
explanation of signal propagation. For explicitness the
DUMB_INIT_SETSID variable has been set to 1 in the image as well.
Fixes #18066
---
Dockerfile | 1 +
chart/templates/workers/worker-deployment.yaml | 3 ++
docs/docker-stack/entrypoint.rst | 41 ++++++++++++++++++++++++++
3 files changed, 45 insertions(+)
diff --git a/Dockerfile b/Dockerfile
index 405470d..1890a87 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -479,6 +479,7 @@ LABEL org.apache.airflow.distro="debian" \
org.opencontainers.image.title="Production Airflow Image" \
org.opencontainers.image.description="Reference, production-ready Apache
Airflow image"
+ENV DUMB_INIT_SETSID="1"
ENTRYPOINT ["/usr/bin/dumb-init", "--", "/entrypoint"]
CMD []
diff --git a/chart/templates/workers/worker-deployment.yaml
b/chart/templates/workers/worker-deployment.yaml
index 68a0e18..023ffa4 100644
--- a/chart/templates/workers/worker-deployment.yaml
+++ b/chart/templates/workers/worker-deployment.yaml
@@ -180,6 +180,9 @@ spec:
envFrom:
{{- include "custom_airflow_environment_from" . | default "\n []" |
indent 10 }}
env:
+ # Only signal the main process, not the process group, to make
Warm Shutdown work properly
+ - name: DUMB_INIT_SETSID
+ value: "0"
{{- include "custom_airflow_environment" . | indent 10 }}
{{- include "standard_airflow_environment" . | indent 10 }}
{{- if .Values.workers.kerberosSidecar.enabled }}
diff --git a/docs/docker-stack/entrypoint.rst b/docs/docker-stack/entrypoint.rst
index eb880d9..7db5b5d 100644
--- a/docs/docker-stack/entrypoint.rst
+++ b/docs/docker-stack/entrypoint.rst
@@ -161,6 +161,47 @@ If there are any other arguments - they are simply passed
to the "airflow" comma
> docker run -it apache/airflow:2.1.2-python3.6 version
2.1.2
+Signal propagation
+------------------
+
+Airflow uses ``dumb-init`` to run as "init" in the entrypoint. This is in
order to propagate
+signals and reap child processes properly. This means that the process that
you run does not have
+to install signal handlers to work properly and be killed when the container
is gracefully terminated.
+The behaviour of signal propagation is configured by ``DUMB_INIT_SETSID``
variable which is set to
+``1`` by default - meaning that the signals will be propagated to the whole
process group, but you can
+set it to ``0`` to enable ``single-child`` behaviour of ``dumb-init`` which
only propagates the
+signals to only single child process.
+
+The table below summarizes ``DUMB_INIT_SETSID`` possible values and their use
cases.
+
++----------------+----------------------------------------------------------------------+
+| Variable value | Use case
|
++----------------+----------------------------------------------------------------------+
+| 1 (default) | Propagates signals to all processes in the process group of
the main |
+| | process running in the container.
|
+| |
|
+| | If you run your processes via ``["bash", "-c"]`` command
and bash |
+| | spawn new processes without ``exec``, this will help to
terminate |
+| | your container gracefully as all processes will receive the
signal. |
++----------------+----------------------------------------------------------------------+
+| 0 | Propagates signals to the main process only.
|
+| |
|
+| | This is useful if your main process handles signals
gracefully. |
+| | A good example is warm shutdown of Celery workers. The
``dumb-init`` |
+| | in this case will only propagate the signals to the main
process, |
+| | but not to the processes that are spawned in the same
process |
+| | group as the main one. For example in case of Celery, the
main |
+| | process will put the worker in "offline" mode, and will
wait |
+| | until all running tasks complete, and only then it will
|
+| | terminate all processes.
|
+| |
|
+| | For Airflow's Celery worker, you should set the variable to
0 |
+| | and either use ``["celery", "worker"]`` command.
|
+| | If you are running it through ``["bash", "-c"]`` command,
|
+| | you need to start the worker via ``exec airflow celery
worker`` |
+| | as the last command executed.
|
++----------------+----------------------------------------------------------------------+
+
Additional quick test options
-----------------------------