Michael Smith created AIRFLOW-4747:
--------------------------------------

             Summary: Airflow Scheduling and DAG Parsing
                 Key: AIRFLOW-4747
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-4747
             Project: Apache Airflow
          Issue Type: Wish
          Components: scheduler
    Affects Versions: 1.10.2
            Reporter: Michael Smith


I read somewhere that there was going to be an attempt to decouple Airflow's 
DAG  parsing from its scheduler function. My assumption would be that this 
could be achieved, for example, by driving Scheduler actions (almost?) entirely 
from the Airflow database. This would eliminate the need for a continuously 
running DAG parse process?

At present we observe significant lag and significant overheads with the 
current (1.10.2) model of scheduling which appears to be heavily coupled with 
the DAG parse. In our environment DAG parse times are typically >1 sec per DAG. 
This means a single DAG parse cycle can take several minutes. DAG parsing is a 
large CPU overhead (on a single node cloud VM we've been forced to allocate 2 
cpu nodes for example). In addition production jobs suffer from fairly large 
lag times between tasks (time between task end and start of follow on task). 
This can be in the order of minutes even when task slots are available.

 

Is anyone working on this enhancement or could provide guidance on resolving 
(possibly a configuration issue our side, but I have experimented with 
configuration options extensively).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to