This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow.git


The following commit(s) were added to refs/heads/main by this push:
     new f6db66e163 Add information for users who ask for requirements (#32262)
f6db66e163 is described below

commit f6db66e16374e504665972feba0831d4148c6d50
Author: Jarek Potiuk <[email protected]>
AuthorDate: Sat Jul 1 23:56:03 2023 +0200

    Add information for users who ask for requirements (#32262)
    
    * Add information for users who ask for requirements
    
    This change is based on a number of discussions with the users asking
    what are the minimum requirements for Airflow to run.
    
    While we cannot give precise answer, we should also make the users
    aware that simple answers are not possible, and that when they are
    deciding to install airflow and manage it on their own, they also
    take the responsibility to monitor and adjust the resources they
    need based on the monitoring they have to run.
    
    * Apply suggestions from code review
    
    Co-authored-by: Pankaj Koti <[email protected]>
    
    * Update docs/apache-airflow/installation/index.rst
    
    ---------
    
    Co-authored-by: Pankaj Koti <[email protected]>
---
 .../administration-and-deployment/scheduler.rst    |  1 +
 docs/apache-airflow/installation/index.rst         | 72 +++++++++++++++++++++-
 2 files changed, 72 insertions(+), 1 deletion(-)

diff --git a/docs/apache-airflow/administration-and-deployment/scheduler.rst 
b/docs/apache-airflow/administration-and-deployment/scheduler.rst
index dfc97e6120..1a2a41b136 100644
--- a/docs/apache-airflow/administration-and-deployment/scheduler.rst
+++ b/docs/apache-airflow/administration-and-deployment/scheduler.rst
@@ -154,6 +154,7 @@ The following databases are fully supported and provide an 
"optimal" experience:
 
   Microsoft SQLServer has not been tested with HA.
 
+.. _fine-tuning-scheduler:
 
 Fine-tuning your Scheduler performance
 --------------------------------------
diff --git a/docs/apache-airflow/installation/index.rst 
b/docs/apache-airflow/installation/index.rst
index 1ddbf5d66b..8f37ca208d 100644
--- a/docs/apache-airflow/installation/index.rst
+++ b/docs/apache-airflow/installation/index.rst
@@ -77,6 +77,9 @@ More details: :doc:`installing-from-sources`
 * You should develop and handle the deployment for all components of Airflow.
 * You are responsible for setting up database, creating and managing database 
schema with ``airflow db`` commands,
   automated startup and recovery, maintenance, cleanup and upgrades of Airflow 
and the Airflow Providers.
+* You need to setup monitoring of your system allowing you to observe 
resources and react to problems.
+* You are expected to configure and manage appropriate resources for the 
installation (memory, CPU, etc) based
+  on the monitoring of your installation and feedback loop. See the notes 
about requirements.
 
 **What Apache Airflow Community provides for that method**
 
@@ -123,6 +126,9 @@ More details:  :doc:`/installation/installing-from-pypi`
 * You should develop and handle the deployment for all components of Airflow.
 * You are responsible for setting up database, creating and managing database 
schema with ``airflow db`` commands,
   automated startup and recovery, maintenance, cleanup and upgrades of Airflow 
and Airflow Providers.
+* You need to setup monitoring of your system allowing you to observe 
resources and react to problems.
+* You are expected to configure and manage appropriate resources for the 
installation (memory, CPU, etc) based
+  on the monitoring of your installation and feedback loop.
 
 **What Apache Airflow Community provides for that method**
 
@@ -181,6 +187,9 @@ and official constraint files- same that are used for 
installing Airflow from Py
   deployments of containers. You can use your own custom mechanism, custom 
Kubernetes deployments,
   custom Docker Compose, custom Helm charts etc., and you should choose it 
based on your experience
   and expectations.
+* You need to setup monitoring of your system allowing you to observe 
resources and react to problems.
+* You are expected to configure and manage appropriate resources for the 
installation (memory, CPU, etc) based
+  on the monitoring of your installation and feedback loop.
 
 **What Apache Airflow Community provides for that method**
 
@@ -238,6 +247,9 @@ More details: :doc:`helm-chart:index`
   those changes when released by upgrading the base image. However, you are 
responsible in creating a
   pipeline of building your own custom images with your own added dependencies 
and Providers and need to
   repeat the customization step and building your own image when new version 
of Airflow image is released.
+* You need to setup monitoring of your system allowing you to observe 
resources and react to problems.
+* You are expected to configure and manage appropriate resources for the 
installation (memory, CPU, etc) based
+  on the monitoring of your installation and feedback loop.
 
 **What Apache Airflow Community provides for that method**
 
@@ -256,7 +268,6 @@ More details: :doc:`helm-chart:index`
 * If you can provide description of a reproducible problem with Airflow 
software, you can open
   issue at `GitHub issues <https://github.com/apache/airflow/issues>`__
 
-
 Using Managed Airflow Services
 ''''''''''''''''''''''''''''''
 
@@ -316,3 +327,62 @@ Follow the  `Ecosystem 
<https://airflow.apache.org/ecosystem/>`__ page to find a
 **Where to ask for help**
 
 * Depends on what the 3rd-party provides. Look at the documentation of the 
3rd-party deployment you use.
+
+
+Notes about minimum requirements
+''''''''''''''''''''''''''''''''
+
+There are often questions about minimum requirements for Airflow for 
production systems, but it is
+not possible to give a simple answer to that question.
+
+The requirements that Airflow might need depend on many factors, including 
(but not limited to):
+  * The deployment your Airflow is installed with (see above ways of 
installing Airflow)
+  * The requirements of the deployment environment (for example Kubernetes, 
Docker, Helm, etc.) that
+    are completely independent from Airflow (for example DNS resources, 
sharing the nodes/resources)
+    with more (or less) pods and containers that are needed that might depend 
on particular choice of
+    the technology/cloud/integration of monitoring etc.
+  * Technical details of database, hardware, network, etc. that your 
deployment is running on
+  * The complexity of the code you add to your DAGS, configuration, plugins, 
settings etc. (note, that
+    Airflow runs the code that DAG author and Deployment Manager provide)
+  * The number and choice of providers you install and use (Airflow has more 
than 80 providers) that can
+    be installed by choice of the Deployment Manager and using them might 
require more resources.
+  * The choice of parameters that you use when tuning Airflow. Airflow has 
many configuration parameters
+    that can fine-tuned to your needs
+  * The number of DagRuns and tasks instances you run with parallel instances 
of each in consideration
+  * How complex are the tasks you run
+
+The above "DAG" characteristics will change over time and even will change 
depending on the time of the day
+or week, so you have to be prepared to continuously monitor the system and 
adjust the parameters to make
+it works smoothly.
+
+While we can provide some specific minimum requirements for some development 
"quick start" - such as
+in case of our :ref:`running-airflow-in-docker` quick-start guide, it is not 
possible to provide any minimum
+requirements for production systems.
+
+The best way to think of resource allocation for Airflow instance is to think 
of it in terms of process
+control theory - where there are two types of systems:
+
+1. Fully predictable, with few knobs and variables, where you can reliably set 
the values for the
+   knobs and have an easy way to determine the behaviour of the system
+
+2. Complex systems with multiple variables, that are hard to predict and where 
you need to monitor
+   the system and adjust the knobs continuously to make sure the system is 
running smoothly.
+
+Airflow (and generally any modern system running usually on cloud services, 
with multiple layers responsible
+for resources as well multiple parameters to control their behaviour) is a 
complex system and they fall
+much more in the second category. If you decide to run Airflow in production 
on your own, you should be
+prepared for the monitor/observe/adjust feedback loop to make sure the system 
is running smoothly.
+
+Having a good monitoring system that will allow you to monitor the system and 
adjust the parameters
+is a must to put that in practice.
+
+There are few guidelines that you can use for optimizing your resource usage 
as well. The
+:ref:`fine-tuning-scheduler` is a good starting point to fine-tune your 
scheduler, you can also follow
+the :ref:`best_practice` guide to make sure you are using Airflow in the most 
efficient way.
+
+Also, one of the important things that Managed Services for Airflow provide is 
that they make a lot
+of opinionated choices and fine-tune the system for you, so you don't have to 
worry about it too much.
+With such managed services, there are usually far less number of knobs to turn 
and choices to make and one
+of the things you pay for is that the Managed Service provider manages the 
system for you and provides
+paid support and allows you to scale the system as needed and allocate the 
right resources - following the
+choices made there when it comes to the kinds of deployment you might have.

Reply via email to