This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow.git


The following commit(s) were added to refs/heads/main by this push:
     new ef1498831a Add section about live-upgrading Airflow (#36637)
ef1498831a is described below

commit ef1498831a54c76a928d0a56eaf4c232925c0c65
Author: Jarek Potiuk <[email protected]>
AuthorDate: Sat Jan 6 21:42:07 2024 +0100

    Add section about live-upgrading Airflow (#36637)
    
    Our users are often asking about live-upgrading Airflow and the answer
    on what and how can be live-upgraded is not obvious and it depends on a
    number of factors - most importantly on the type of deployment you run
    and  type of executor you use.
    
    This PR adds a basic description for it - following the recent update
    explaining the different live-upgrade scenarios available.
---
 .../production-deployment.rst                      | 68 +++++++++++++++++++++-
 docs/apache-airflow/core-concepts/overview.rst     |  6 ++
 2 files changed, 73 insertions(+), 1 deletion(-)

diff --git 
a/docs/apache-airflow/administration-and-deployment/production-deployment.rst 
b/docs/apache-airflow/administration-and-deployment/production-deployment.rst
index 6f414ed194..281dc7f658 100644
--- 
a/docs/apache-airflow/administration-and-deployment/production-deployment.rst
+++ 
b/docs/apache-airflow/administration-and-deployment/production-deployment.rst
@@ -83,7 +83,6 @@ See :doc:`logging-monitoring/logging-tasks` for 
configurations.
     The logs only appear in your DFS after the task has finished. You can view 
the logs while the task is
     running in UI itself.
 
-
 Configuration
 =============
 
@@ -126,6 +125,73 @@ Helm Chart for Kubernetes
 `Helm <https://helm.sh/>`__ provides a simple mechanism to deploy software to 
a Kubernetes cluster. We maintain
 :doc:`an official Helm chart <helm-chart:index>` for Airflow that helps you 
define, install, and upgrade deployment. The Helm Chart uses :doc:`our official 
Docker image and Dockerfile <docker-stack:index>` that is also maintained and 
released by the community.
 
+
+Live-upgrading Airflow
+======================
+
+Airflow is by-design a distributed system and while the
+:ref:`basic Airflow deployment <overview-basic-airflow-architecture>` requires 
usually a complete Airflow
+restart to upgrade, it is possible to upgrade Airflow without any downtime 
when you run Airflow in a
+:ref:`distributed deployment <overview-basic-airflow-architecture>`.
+
+Such a live upgrade is possible when there are no changes in Airflow metadata 
database schema,
+so you should aim to do it when you upgrade Airflow patch-level (bugfix) 
versions of the same minor
+Airflow version or when upgrading between adjacent minor versions (feature) of 
Airflow after reviewing the
+:doc:`release notes <../release_notes>` and :doc:`../migrations-ref` and 
making sure there are no changes
+in the database schema between them.
+
+In some cases when database migration is not significant, such live migration 
could also potentially be
+possible with upgrading Airflow database first and between MINOR versions, 
however, this is not recommended
+and you should only do it on your own risk, carefully reviewing the 
modifications to be applied to the
+database schema and assessing the risk of such upgrade - it requires deep 
knowledge of Airflow
+database :doc:`../database-erd-ref` and reviewing the 
:doc:`../migrations-ref`. You should always thoroughly
+test such upgrade in a staging environment first. Usually cost connected with 
such live upgrade preparation
+will be higher than the cost of a short downtime of Airflow, so we strongly 
discourage such live upgrades.
+
+Make sure to test such live upgrade procedure in a staging environment before 
you do it in production,
+to avoid any surprises and side-effects.
+
+When it comes to live-upgrading the ``Webserver``, ``Triggerer`` components, 
if you run them in separate
+environments and have more than one instances for each of them, you can 
rolling-restart them one by one,
+without any downtime. This should usually be done as the first step in your 
upgrade procedure.
+
+When you are running a deployment with separate ``DAG processor``, in a
+:ref:`Separate DAG processing deployment 
<overview-separate-dag-processing-airflow-architecture>`
+the ``DAG processor`` is not horizontally scaled - even if you have more of 
them there is usually one
+``DAG processor`` running at a time per specific folder, so you can just stop 
it and start the new one -
+but since the ``DAG processor`` is not a critical component, it's ok for it to 
experience a short downtime.
+
+When it comes to upgrading the schedulers and workers, you can use the live 
upgrade capabilities
+of the executor you use:
+
+* For the :doc:`Local executor <../core-concepts/executor/local>` your tasks 
are running as subprocesses of
+  scheduler and you cannot upgrade the Scheduler without killing the tasks run 
by it. You can either
+  pause all your DAGs and wait for the running tasks to complete or just stop 
the scheduler and kill all
+  the tasks it runs - then you will need to clear and restart those tasks 
manually after the upgrade
+  is completed (or rely on ``retry`` being set for stopped tasks).
+
+* For the :doc:`Celery executor <../core-concepts/executor/celery>`, you have 
to first put your workers in
+  offline mode (usually by setting a single ``TERM`` signal to the workers), 
wait until the workers
+  finish all the running tasks, and then upgrade the code (for example by 
replacing the image the workers run
+  in and restart the workers). You can monitor your workers via ``flower`` 
monitoring tool and see the number
+  of running tasks going down to zero. Once the workers are upgraded, they 
will be automatically put in online
+  mode and start picking up new tasks. You can then upgrade the ``Scheduler`` 
in a rolling restart mode.
+
+* For the :doc:`Kubernetes executor <../core-concepts/executor/kubernetes>`, 
you can upgrade the scheduler
+  triggerer, webserver in a rolling restart mode, and generally you should not 
worry about the workers, as they
+  are managed by the Kubernetes cluster and will be automatically adopted by 
``Schedulers`` when they are
+  upgraded and restarted.
+
+* For the :doc:``CeleryKubernetesExecutor 
<../core-concepts/executor/celery-kubernetes>``, you follow the
+  same procedure as for the ``CeleryExecutor`` - you put the workers in 
offline mode, wait for the running
+  tasks to complete, upgrade the workers, and then upgrade the scheduler, 
triggerer and webserver in a
+  rolling restart mode - which should also adopt tasks run via the 
``KubernetesExecutor`` part of the
+  executor.
+
+Most of the rolling-restart upgrade scenarios are implemented in the 
:doc:`helm-chart:index`, so you can
+use it to upgrade your Airflow deployment without any downtime - especially in 
case you do patch-level
+upgrades of Airflow.
+
 .. _production-deployment:kerberos:
 
 Kerberos-authenticated workers
diff --git a/docs/apache-airflow/core-concepts/overview.rst 
b/docs/apache-airflow/core-concepts/overview.rst
index 48487cace8..9ee9ea8b0b 100644
--- a/docs/apache-airflow/core-concepts/overview.rst
+++ b/docs/apache-airflow/core-concepts/overview.rst
@@ -126,6 +126,8 @@ The meaning of the different connection types in the 
diagrams below is as follow
 * **black solid lines** represent accessing the UI to manage execution of the 
workflows
 * **red dashed lines** represent accessing the *metadata database* by all 
components
 
+.. _overview-basic-airflow-architecture:
+
 Basic Airflow deployment
 ........................
 
@@ -143,6 +145,8 @@ and maintenance are all done by the same person and there 
are no security perime
 If you want to run Airflow on a single machine in a simple single-machine 
setup, you can skip the
 more complex diagrams below and go straight to the :ref:`overview:workloads` 
section.
 
+.. _overview-distributed-airflow-architecture:
+
 Distributed Airflow architecture
 ................................
 
@@ -164,6 +168,8 @@ Helm Chart documentation. Helm chart is one of the ways how 
to deploy Airflow in
 
 .. image:: ../img/diagram_distributed_airflow_architecture.png
 
+.. _overview-separate-dag-processing-airflow-architecture:
+
 Separate DAG processing architecture
 ....................................
 

Reply via email to