yesemsanthoshkumar commented on a change in pull request #10531:
URL: https://github.com/apache/airflow/pull/10531#discussion_r479822281



##########
File path: docs/scheduler.rst
##########
@@ -18,11 +18,41 @@
 Scheduler
 ==========
 
-The Airflow scheduler monitors all tasks and DAGs, then triggers the
-task instances once their dependencies are complete. Behind the scenes,
-the scheduler spins up a subprocess, which monitors and stays in sync with all
-DAGs in the specified DAG directory. Once per minute, by default, the scheduler
-collects DAG parsing results and checks whether any active tasks can be 
triggered.
+The scheduler is the core component in Airflow that is responsible for 
monitoring all tasks and DAGs and triggers the task instances once their 
dependencies are complete. And for this reason, it is imperative to learn about 
the working of the scheduler.
+
+Scheduling in airflow involves the 3 following components.
+
+1. ``DAGFileProcessor`` - Responsible for parsing the DAG definition in a file 
and creating the necessary DAG runs and TaskInstances
+
+2. ``DAGFileProcessorManager`` - Responsible for listing the files in the 
DagBag and creating new DAGFileProcessors when required
+
+3. ``SchedulerJob`` - Responsible for sending the TaskInstances to the executor
+
+Logic behind scheduling is as follows:
+
+1. Airflow kicks off DAGFileProcessorManager along with the SchedulerJob. The 
DAGFileProcessorManager enumerates all the files in the DAG directory.
+
+2. The DAGFileProcessorManager spawns child processes known as 
DAGFileProcessor which is responsible for creating the necessary DAG Runs and 
TaskInstances. When it determines that task instances should run, it updates 
their state to ``SCHEDULED``. The number of DAGFileProcessor to spawn is 
configurable via ``num_runs`` in ``airflow.cfg``. TODO: Check if the 
configuration is correct

Review comment:
       So num_runs is about how many iterations the scheduler go over the 
DagBag, after which the scheduler shutsdown gracefully?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to