yesemsanthoshkumar commented on a change in pull request #10531:
URL: https://github.com/apache/airflow/pull/10531#discussion_r479822281
##########
File path: docs/scheduler.rst
##########
@@ -18,11 +18,41 @@
Scheduler
==========
-The Airflow scheduler monitors all tasks and DAGs, then triggers the
-task instances once their dependencies are complete. Behind the scenes,
-the scheduler spins up a subprocess, which monitors and stays in sync with all
-DAGs in the specified DAG directory. Once per minute, by default, the scheduler
-collects DAG parsing results and checks whether any active tasks can be
triggered.
+The scheduler is the core component in Airflow that is responsible for
monitoring all tasks and DAGs and triggers the task instances once their
dependencies are complete. And for this reason, it is imperative to learn about
the working of the scheduler.
+
+Scheduling in airflow involves the 3 following components.
+
+1. ``DAGFileProcessor`` - Responsible for parsing the DAG definition in a file
and creating the necessary DAG runs and TaskInstances
+
+2. ``DAGFileProcessorManager`` - Responsible for listing the files in the
DagBag and creating new DAGFileProcessors when required
+
+3. ``SchedulerJob`` - Responsible for sending the TaskInstances to the executor
+
+Logic behind scheduling is as follows:
+
+1. Airflow kicks off DAGFileProcessorManager along with the SchedulerJob. The
DAGFileProcessorManager enumerates all the files in the DAG directory.
+
+2. The DAGFileProcessorManager spawns child processes known as
DAGFileProcessor which is responsible for creating the necessary DAG Runs and
TaskInstances. When it determines that task instances should run, it updates
their state to ``SCHEDULED``. The number of DAGFileProcessor to spawn is
configurable via ``num_runs`` in ``airflow.cfg``. TODO: Check if the
configuration is correct
Review comment:
So num_runs is about how many iterations the scheduler go over the
DagBag, after which the scheduler shutsdown gracefully?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]