potiuk commented on a change in pull request #18356:
URL: https://github.com/apache/airflow/pull/18356#discussion_r712917284
##########
File path: docs/apache-airflow/concepts/scheduler.rst
##########
@@ -138,18 +141,173 @@ The following databases are fully supported and provide
an "optimal" experience:
Microsoft SQLServer has not been tested with HA.
+
+Fine-tuning your Scheduler performance
+--------------------------------------
+
+What impacts scheduler's performance
+""""""""""""""""""""""""""""""""""""
+
+The Scheduler is responsible for two operations:
+
+* continuously parsing DAG files and synchronizing with the DAG in the database
+* continuously scheduling tasks for execution
+
+Those two tasks are executed in parallel by the scheduler and run
independently of each other in
+different processes. In order to fine-tune your scheduler, you need to include
a number of factors:
+
+* The kind of deployment you have
+ * what kind of filesystem you have to share the DAGs (impacts performance
of continuously reading DAGs)
+ * how fast the filesystem is (in many cases of distributed cloud
filesystem you can pay extra to get
+ more throughput/faster filesystem
+ * how much memory you have for your processing
+ * how much CPU you have available
+ * how much networking throughput you have available
+
+* The logic and definition of your DAG structure:
+ * how many DAG files you have
+ * how many DAGs you have in your files
+ * how large the DAG files are (remember scheduler needs to read and parse
the file every n seconds)
+ * how complex they are (i.e. how fast they can be parsed, how many tasks
and dependencies they have)
+ * whether parsing your DAGs involves heavy processing (Hint! It should
not. See :doc:`/best-practices`)
+
+* The scheduler configuration
+ * How many schedulers you have
+ * How many parsing processes you have in your scheduler
+ * How much time scheduler waits between re-parsing of the same DAG (it
happens continuously)
+ * How many task instances scheduler processes in one loop
+ * How many new DAG runs should be created/scheduled per loop
+ * Whether to execute "mini-scheduler" after completed task to speed up
scheduling dependent tasks
Review comment:
Agree,
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]