This is an automated email from the ASF dual-hosted git repository.
potiuk pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow.git
The following commit(s) were added to refs/heads/main by this push:
new ef1498831a Add section about live-upgrading Airflow (#36637)
ef1498831a is described below
commit ef1498831a54c76a928d0a56eaf4c232925c0c65
Author: Jarek Potiuk <[email protected]>
AuthorDate: Sat Jan 6 21:42:07 2024 +0100
Add section about live-upgrading Airflow (#36637)
Our users are often asking about live-upgrading Airflow and the answer
on what and how can be live-upgraded is not obvious and it depends on a
number of factors - most importantly on the type of deployment you run
and type of executor you use.
This PR adds a basic description for it - following the recent update
explaining the different live-upgrade scenarios available.
---
.../production-deployment.rst | 68 +++++++++++++++++++++-
docs/apache-airflow/core-concepts/overview.rst | 6 ++
2 files changed, 73 insertions(+), 1 deletion(-)
diff --git
a/docs/apache-airflow/administration-and-deployment/production-deployment.rst
b/docs/apache-airflow/administration-and-deployment/production-deployment.rst
index 6f414ed194..281dc7f658 100644
---
a/docs/apache-airflow/administration-and-deployment/production-deployment.rst
+++
b/docs/apache-airflow/administration-and-deployment/production-deployment.rst
@@ -83,7 +83,6 @@ See :doc:`logging-monitoring/logging-tasks` for
configurations.
The logs only appear in your DFS after the task has finished. You can view
the logs while the task is
running in UI itself.
-
Configuration
=============
@@ -126,6 +125,73 @@ Helm Chart for Kubernetes
`Helm <https://helm.sh/>`__ provides a simple mechanism to deploy software to
a Kubernetes cluster. We maintain
:doc:`an official Helm chart <helm-chart:index>` for Airflow that helps you
define, install, and upgrade deployment. The Helm Chart uses :doc:`our official
Docker image and Dockerfile <docker-stack:index>` that is also maintained and
released by the community.
+
+Live-upgrading Airflow
+======================
+
+Airflow is by-design a distributed system and while the
+:ref:`basic Airflow deployment <overview-basic-airflow-architecture>` requires
usually a complete Airflow
+restart to upgrade, it is possible to upgrade Airflow without any downtime
when you run Airflow in a
+:ref:`distributed deployment <overview-basic-airflow-architecture>`.
+
+Such a live upgrade is possible when there are no changes in Airflow metadata
database schema,
+so you should aim to do it when you upgrade Airflow patch-level (bugfix)
versions of the same minor
+Airflow version or when upgrading between adjacent minor versions (feature) of
Airflow after reviewing the
+:doc:`release notes <../release_notes>` and :doc:`../migrations-ref` and
making sure there are no changes
+in the database schema between them.
+
+In some cases when database migration is not significant, such live migration
could also potentially be
+possible with upgrading Airflow database first and between MINOR versions,
however, this is not recommended
+and you should only do it on your own risk, carefully reviewing the
modifications to be applied to the
+database schema and assessing the risk of such upgrade - it requires deep
knowledge of Airflow
+database :doc:`../database-erd-ref` and reviewing the
:doc:`../migrations-ref`. You should always thoroughly
+test such upgrade in a staging environment first. Usually cost connected with
such live upgrade preparation
+will be higher than the cost of a short downtime of Airflow, so we strongly
discourage such live upgrades.
+
+Make sure to test such live upgrade procedure in a staging environment before
you do it in production,
+to avoid any surprises and side-effects.
+
+When it comes to live-upgrading the ``Webserver``, ``Triggerer`` components,
if you run them in separate
+environments and have more than one instances for each of them, you can
rolling-restart them one by one,
+without any downtime. This should usually be done as the first step in your
upgrade procedure.
+
+When you are running a deployment with separate ``DAG processor``, in a
+:ref:`Separate DAG processing deployment
<overview-separate-dag-processing-airflow-architecture>`
+the ``DAG processor`` is not horizontally scaled - even if you have more of
them there is usually one
+``DAG processor`` running at a time per specific folder, so you can just stop
it and start the new one -
+but since the ``DAG processor`` is not a critical component, it's ok for it to
experience a short downtime.
+
+When it comes to upgrading the schedulers and workers, you can use the live
upgrade capabilities
+of the executor you use:
+
+* For the :doc:`Local executor <../core-concepts/executor/local>` your tasks
are running as subprocesses of
+ scheduler and you cannot upgrade the Scheduler without killing the tasks run
by it. You can either
+ pause all your DAGs and wait for the running tasks to complete or just stop
the scheduler and kill all
+ the tasks it runs - then you will need to clear and restart those tasks
manually after the upgrade
+ is completed (or rely on ``retry`` being set for stopped tasks).
+
+* For the :doc:`Celery executor <../core-concepts/executor/celery>`, you have
to first put your workers in
+ offline mode (usually by setting a single ``TERM`` signal to the workers),
wait until the workers
+ finish all the running tasks, and then upgrade the code (for example by
replacing the image the workers run
+ in and restart the workers). You can monitor your workers via ``flower``
monitoring tool and see the number
+ of running tasks going down to zero. Once the workers are upgraded, they
will be automatically put in online
+ mode and start picking up new tasks. You can then upgrade the ``Scheduler``
in a rolling restart mode.
+
+* For the :doc:`Kubernetes executor <../core-concepts/executor/kubernetes>`,
you can upgrade the scheduler
+ triggerer, webserver in a rolling restart mode, and generally you should not
worry about the workers, as they
+ are managed by the Kubernetes cluster and will be automatically adopted by
``Schedulers`` when they are
+ upgraded and restarted.
+
+* For the :doc:``CeleryKubernetesExecutor
<../core-concepts/executor/celery-kubernetes>``, you follow the
+ same procedure as for the ``CeleryExecutor`` - you put the workers in
offline mode, wait for the running
+ tasks to complete, upgrade the workers, and then upgrade the scheduler,
triggerer and webserver in a
+ rolling restart mode - which should also adopt tasks run via the
``KubernetesExecutor`` part of the
+ executor.
+
+Most of the rolling-restart upgrade scenarios are implemented in the
:doc:`helm-chart:index`, so you can
+use it to upgrade your Airflow deployment without any downtime - especially in
case you do patch-level
+upgrades of Airflow.
+
.. _production-deployment:kerberos:
Kerberos-authenticated workers
diff --git a/docs/apache-airflow/core-concepts/overview.rst
b/docs/apache-airflow/core-concepts/overview.rst
index 48487cace8..9ee9ea8b0b 100644
--- a/docs/apache-airflow/core-concepts/overview.rst
+++ b/docs/apache-airflow/core-concepts/overview.rst
@@ -126,6 +126,8 @@ The meaning of the different connection types in the
diagrams below is as follow
* **black solid lines** represent accessing the UI to manage execution of the
workflows
* **red dashed lines** represent accessing the *metadata database* by all
components
+.. _overview-basic-airflow-architecture:
+
Basic Airflow deployment
........................
@@ -143,6 +145,8 @@ and maintenance are all done by the same person and there
are no security perime
If you want to run Airflow on a single machine in a simple single-machine
setup, you can skip the
more complex diagrams below and go straight to the :ref:`overview:workloads`
section.
+.. _overview-distributed-airflow-architecture:
+
Distributed Airflow architecture
................................
@@ -164,6 +168,8 @@ Helm Chart documentation. Helm chart is one of the ways how
to deploy Airflow in
.. image:: ../img/diagram_distributed_airflow_architecture.png
+.. _overview-separate-dag-processing-airflow-architecture:
+
Separate DAG processing architecture
....................................