pankajkoti commented on code in PR #32262: URL: https://github.com/apache/airflow/pull/32262#discussion_r1247754933
########## docs/apache-airflow/installation/index.rst: ########## @@ -316,3 +327,62 @@ Follow the `Ecosystem <https://airflow.apache.org/ecosystem/>`__ page to find a **Where to ask for help** * Depends on what the 3rd-party provides. Look at the documentation of the 3rd-party deployment you use. + + +Notes about minimum requirements +'''''''''''''''''''''''''''''''' + +There are often questions about minimum requirements for Airflow for Production systems, but it is Review Comment: ```suggestion There are often questions about minimum requirements for Airflow for production systems, but it is ``` ########## docs/apache-airflow/installation/index.rst: ########## @@ -316,3 +327,62 @@ Follow the `Ecosystem <https://airflow.apache.org/ecosystem/>`__ page to find a **Where to ask for help** * Depends on what the 3rd-party provides. Look at the documentation of the 3rd-party deployment you use. + + +Notes about minimum requirements +'''''''''''''''''''''''''''''''' + +There are often questions about minimum requirements for Airflow for Production systems, but it is +not possible to give a simple answer to that question. + +The requirements that Airflow might need depend on many factors, including (but not limited to): + * The deployment your Airflow is installed with (see above ways of installing Airflow) + * The requirements of the Deployment environment (for example Kubernetes, Docker, Helm, etc.) that Review Comment: ```suggestion * The requirements of the deployment environment (for example Kubernetes, Docker, Helm, etc.) that ``` ########## docs/apache-airflow/installation/index.rst: ########## @@ -316,3 +327,62 @@ Follow the `Ecosystem <https://airflow.apache.org/ecosystem/>`__ page to find a **Where to ask for help** * Depends on what the 3rd-party provides. Look at the documentation of the 3rd-party deployment you use. + + +Notes about minimum requirements +'''''''''''''''''''''''''''''''' + +There are often questions about minimum requirements for Airflow for Production systems, but it is +not possible to give a simple answer to that question. + +The requirements that Airflow might need depend on many factors, including (but not limited to): + * The deployment your Airflow is installed with (see above ways of installing Airflow) + * The requirements of the Deployment environment (for example Kubernetes, Docker, Helm, etc.) that + are completely independent from Airflow (for example DNS resources, sharing the nodes/resources + with more (or less) pods and containers that are needed that might depend on particular choice of + the technology/cloud/integration of monitoring etc. etc. + * Technical details of database, hardware, network, etc. that your deployment is running on + * The complexity of the code you add to your DAGS, configuration, plugins, settings etc. (note, that + Airflow runs the code that DAG author and Deployment Manager provide) + * The number and choice of providers you install and use (Airflow has more than 80 providers) that can + be installed by choice of the Deployment Manager and using them might require more resources. + * The choice of parameters that you use when tuning Airflow. Airflow has many configuration parameters + that can fine-tuned to your needs + * The number of DagRuns and tasks instances you run with parallel instances of each in consideration + * How complex are the tasks you run + +The above "DAG" characteristics will change over time and even will change depending on the time of the day +or week, so you have to be prepared to continuously monitor the system and adjust the parameters to make +it works smoothly. + +While we can provide some specific minimum requirements for some development "quick start" - such as +in case of our :ref:`running-airflow-in-docker` quick-start guide, it is not possible to provide any minimum +requirements for production systems. + +The best way to think of resource allocation for Airflow instance is to think of it in terms of process +control theory - where there are two types of systems: + +1. Fully predictable, with few knobs and variables, where you can reliably set the values for the + knobs and have an easy way to determine the behaviour of the system + +2. Complex systems with multiple variables, that are hard to predict and where you need to monitor + the system and adjust the knobs continuously to make sure the system is running smoothly. + +Airflow (and generally any modern system running usually on cloud services, with multiple layers responsible +for resources as well multiple parameters to control their behaviour) is a complex system and they fall +much more in the second category. If you decide to run Airflow in production on your own, you should be +prepared for the monitor/observe/adjust feedback loop to make sure the system is running smoothly. + +Having a good monitoring system that will allow you to monitor the system and adjust the parameters +is a must to put that in practice. + +There are few guidelines that you can use for optimizing your resource usage as well. The +:ref:`fine-tuning-scheduler` is a good starting point to fine-tune your scheduler, you can also follow +the :ref:`best_practice` guide to make sure you are using Airflow in the most efficient way. + +Also, one of the important things that Manages Services for Airflow provide is that they made a lot +of opinionated choices and fine-tuned the system for you, so you don't have to worry about it too much. Review Comment: ```suggestion of opinionated choices and fine-tune the system for you, so you don't have to worry about it too much. ``` ########## docs/apache-airflow/installation/index.rst: ########## @@ -316,3 +327,62 @@ Follow the `Ecosystem <https://airflow.apache.org/ecosystem/>`__ page to find a **Where to ask for help** * Depends on what the 3rd-party provides. Look at the documentation of the 3rd-party deployment you use. + + +Notes about minimum requirements +'''''''''''''''''''''''''''''''' + +There are often questions about minimum requirements for Airflow for Production systems, but it is +not possible to give a simple answer to that question. + +The requirements that Airflow might need depend on many factors, including (but not limited to): + * The deployment your Airflow is installed with (see above ways of installing Airflow) + * The requirements of the Deployment environment (for example Kubernetes, Docker, Helm, etc.) that + are completely independent from Airflow (for example DNS resources, sharing the nodes/resources + with more (or less) pods and containers that are needed that might depend on particular choice of + the technology/cloud/integration of monitoring etc. etc. + * Technical details of database, hardware, network, etc. that your deployment is running on + * The complexity of the code you add to your DAGS, configuration, plugins, settings etc. (note, that + Airflow runs the code that DAG author and Deployment Manager provide) + * The number and choice of providers you install and use (Airflow has more than 80 providers) that can + be installed by choice of the Deployment Manager and using them might require more resources. + * The choice of parameters that you use when tuning Airflow. Airflow has many configuration parameters + that can fine-tuned to your needs + * The number of DagRuns and tasks instances you run with parallel instances of each in consideration + * How complex are the tasks you run + +The above "DAG" characteristics will change over time and even will change depending on the time of the day +or week, so you have to be prepared to continuously monitor the system and adjust the parameters to make +it works smoothly. + +While we can provide some specific minimum requirements for some development "quick start" - such as +in case of our :ref:`running-airflow-in-docker` quick-start guide, it is not possible to provide any minimum +requirements for production systems. + +The best way to think of resource allocation for Airflow instance is to think of it in terms of process +control theory - where there are two types of systems: + +1. Fully predictable, with few knobs and variables, where you can reliably set the values for the + knobs and have an easy way to determine the behaviour of the system + +2. Complex systems with multiple variables, that are hard to predict and where you need to monitor + the system and adjust the knobs continuously to make sure the system is running smoothly. + +Airflow (and generally any modern system running usually on cloud services, with multiple layers responsible +for resources as well multiple parameters to control their behaviour) is a complex system and they fall +much more in the second category. If you decide to run Airflow in production on your own, you should be +prepared for the monitor/observe/adjust feedback loop to make sure the system is running smoothly. + +Having a good monitoring system that will allow you to monitor the system and adjust the parameters +is a must to put that in practice. + +There are few guidelines that you can use for optimizing your resource usage as well. The +:ref:`fine-tuning-scheduler` is a good starting point to fine-tune your scheduler, you can also follow +the :ref:`best_practice` guide to make sure you are using Airflow in the most efficient way. + +Also, one of the important things that Manages Services for Airflow provide is that they made a lot +of opinionated choices and fine-tuned the system for you, so you don't have to worry about it too much. +With such managed services, there are usually far less numbers knobs to turn and choices to made and one +of the things you pay for is that the Managed Service provider manages the system for you and provides +paid support and allows you to scale the system as needed and allocate the right resources - following the +choices their made when it comes to kinds of deployment you might have. Review Comment: ```suggestion choices made there when it comes to the kinds of deployment you might have. ``` ########## docs/apache-airflow/installation/index.rst: ########## @@ -316,3 +327,62 @@ Follow the `Ecosystem <https://airflow.apache.org/ecosystem/>`__ page to find a **Where to ask for help** * Depends on what the 3rd-party provides. Look at the documentation of the 3rd-party deployment you use. + + +Notes about minimum requirements +'''''''''''''''''''''''''''''''' + +There are often questions about minimum requirements for Airflow for Production systems, but it is +not possible to give a simple answer to that question. + +The requirements that Airflow might need depend on many factors, including (but not limited to): + * The deployment your Airflow is installed with (see above ways of installing Airflow) + * The requirements of the Deployment environment (for example Kubernetes, Docker, Helm, etc.) that + are completely independent from Airflow (for example DNS resources, sharing the nodes/resources + with more (or less) pods and containers that are needed that might depend on particular choice of + the technology/cloud/integration of monitoring etc. etc. + * Technical details of database, hardware, network, etc. that your deployment is running on + * The complexity of the code you add to your DAGS, configuration, plugins, settings etc. (note, that + Airflow runs the code that DAG author and Deployment Manager provide) + * The number and choice of providers you install and use (Airflow has more than 80 providers) that can + be installed by choice of the Deployment Manager and using them might require more resources. + * The choice of parameters that you use when tuning Airflow. Airflow has many configuration parameters + that can fine-tuned to your needs + * The number of DagRuns and tasks instances you run with parallel instances of each in consideration + * How complex are the tasks you run + +The above "DAG" characteristics will change over time and even will change depending on the time of the day +or week, so you have to be prepared to continuously monitor the system and adjust the parameters to make +it works smoothly. + +While we can provide some specific minimum requirements for some development "quick start" - such as +in case of our :ref:`running-airflow-in-docker` quick-start guide, it is not possible to provide any minimum +requirements for production systems. + +The best way to think of resource allocation for Airflow instance is to think of it in terms of process +control theory - where there are two types of systems: + +1. Fully predictable, with few knobs and variables, where you can reliably set the values for the + knobs and have an easy way to determine the behaviour of the system + +2. Complex systems with multiple variables, that are hard to predict and where you need to monitor + the system and adjust the knobs continuously to make sure the system is running smoothly. + +Airflow (and generally any modern system running usually on cloud services, with multiple layers responsible +for resources as well multiple parameters to control their behaviour) is a complex system and they fall +much more in the second category. If you decide to run Airflow in production on your own, you should be +prepared for the monitor/observe/adjust feedback loop to make sure the system is running smoothly. + +Having a good monitoring system that will allow you to monitor the system and adjust the parameters +is a must to put that in practice. + +There are few guidelines that you can use for optimizing your resource usage as well. The +:ref:`fine-tuning-scheduler` is a good starting point to fine-tune your scheduler, you can also follow +the :ref:`best_practice` guide to make sure you are using Airflow in the most efficient way. + +Also, one of the important things that Manages Services for Airflow provide is that they made a lot Review Comment: ```suggestion Also, one of the important things that Managed Services for Airflow provide is that they make a lot ``` Also thinking if it could be `managed services` instead of title cased `Managed Services` across all the occurrences in this file 🤔 ########## docs/apache-airflow/installation/index.rst: ########## @@ -316,3 +327,62 @@ Follow the `Ecosystem <https://airflow.apache.org/ecosystem/>`__ page to find a **Where to ask for help** * Depends on what the 3rd-party provides. Look at the documentation of the 3rd-party deployment you use. + + +Notes about minimum requirements +'''''''''''''''''''''''''''''''' + +There are often questions about minimum requirements for Airflow for Production systems, but it is +not possible to give a simple answer to that question. + +The requirements that Airflow might need depend on many factors, including (but not limited to): + * The deployment your Airflow is installed with (see above ways of installing Airflow) + * The requirements of the Deployment environment (for example Kubernetes, Docker, Helm, etc.) that + are completely independent from Airflow (for example DNS resources, sharing the nodes/resources + with more (or less) pods and containers that are needed that might depend on particular choice of + the technology/cloud/integration of monitoring etc. etc. Review Comment: ```suggestion the technology/cloud/integration of monitoring etc. ``` ########## docs/apache-airflow/installation/index.rst: ########## @@ -316,3 +327,62 @@ Follow the `Ecosystem <https://airflow.apache.org/ecosystem/>`__ page to find a **Where to ask for help** * Depends on what the 3rd-party provides. Look at the documentation of the 3rd-party deployment you use. + + +Notes about minimum requirements +'''''''''''''''''''''''''''''''' + +There are often questions about minimum requirements for Airflow for Production systems, but it is +not possible to give a simple answer to that question. + +The requirements that Airflow might need depend on many factors, including (but not limited to): + * The deployment your Airflow is installed with (see above ways of installing Airflow) + * The requirements of the Deployment environment (for example Kubernetes, Docker, Helm, etc.) that + are completely independent from Airflow (for example DNS resources, sharing the nodes/resources + with more (or less) pods and containers that are needed that might depend on particular choice of + the technology/cloud/integration of monitoring etc. etc. + * Technical details of database, hardware, network, etc. that your deployment is running on + * The complexity of the code you add to your DAGS, configuration, plugins, settings etc. (note, that + Airflow runs the code that DAG author and Deployment Manager provide) + * The number and choice of providers you install and use (Airflow has more than 80 providers) that can + be installed by choice of the Deployment Manager and using them might require more resources. + * The choice of parameters that you use when tuning Airflow. Airflow has many configuration parameters + that can fine-tuned to your needs + * The number of DagRuns and tasks instances you run with parallel instances of each in consideration + * How complex are the tasks you run + +The above "DAG" characteristics will change over time and even will change depending on the time of the day +or week, so you have to be prepared to continuously monitor the system and adjust the parameters to make +it works smoothly. + +While we can provide some specific minimum requirements for some development "quick start" - such as +in case of our :ref:`running-airflow-in-docker` quick-start guide, it is not possible to provide any minimum +requirements for production systems. + +The best way to think of resource allocation for Airflow instance is to think of it in terms of process +control theory - where there are two types of systems: + +1. Fully predictable, with few knobs and variables, where you can reliably set the values for the + knobs and have an easy way to determine the behaviour of the system + +2. Complex systems with multiple variables, that are hard to predict and where you need to monitor + the system and adjust the knobs continuously to make sure the system is running smoothly. + +Airflow (and generally any modern system running usually on cloud services, with multiple layers responsible +for resources as well multiple parameters to control their behaviour) is a complex system and they fall +much more in the second category. If you decide to run Airflow in production on your own, you should be +prepared for the monitor/observe/adjust feedback loop to make sure the system is running smoothly. + +Having a good monitoring system that will allow you to monitor the system and adjust the parameters +is a must to put that in practice. + +There are few guidelines that you can use for optimizing your resource usage as well. The +:ref:`fine-tuning-scheduler` is a good starting point to fine-tune your scheduler, you can also follow +the :ref:`best_practice` guide to make sure you are using Airflow in the most efficient way. + +Also, one of the important things that Manages Services for Airflow provide is that they made a lot +of opinionated choices and fine-tuned the system for you, so you don't have to worry about it too much. +With such managed services, there are usually far less numbers knobs to turn and choices to made and one Review Comment: ```suggestion With such managed services, there are usually far less number of knobs to turn and choices to make and one ``` ########## docs/apache-airflow/installation/index.rst: ########## @@ -316,3 +327,62 @@ Follow the `Ecosystem <https://airflow.apache.org/ecosystem/>`__ page to find a **Where to ask for help** * Depends on what the 3rd-party provides. Look at the documentation of the 3rd-party deployment you use. + + +Notes about minimum requirements +'''''''''''''''''''''''''''''''' + +There are often questions about minimum requirements for Airflow for Production systems, but it is +not possible to give a simple answer to that question. + +The requirements that Airflow might need depend on many factors, including (but not limited to): + * The deployment your Airflow is installed with (see above ways of installing Airflow) + * The requirements of the Deployment environment (for example Kubernetes, Docker, Helm, etc.) that + are completely independent from Airflow (for example DNS resources, sharing the nodes/resources Review Comment: we're missing to close the `(` opened here -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
