[
https://issues.apache.org/jira/browse/AIRFLOW-4747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16858753#comment-16858753
]
Ash Berlin-Taylor commented on AIRFLOW-4747:
--------------------------------------------
AIRFLOW-2761 (PR: https://github.com/apache/airflow/pull/4234/files) which
landed in 1.10.3 might help things a bit - depending exactly what the slow bit
is. (Check out the graphs in the PR)
> Airflow Scheduling and DAG Parsing
> ----------------------------------
>
> Key: AIRFLOW-4747
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4747
> Project: Apache Airflow
> Issue Type: Wish
> Components: scheduler
> Affects Versions: 1.10.2
> Reporter: Michael Smith
> Priority: Major
>
> I read somewhere that there was going to be an attempt to decouple Airflow's
> DAG parsing from its scheduler function. My assumption would be that this
> could be achieved, for example, by driving Scheduler actions (almost?)
> entirely from the Airflow database. This would eliminate the need for a
> continuously running DAG parse process?
> At present we observe significant lag and significant overheads with the
> current (1.10.2) model of scheduling which appears to be heavily coupled with
> the DAG parse. In our environment DAG parse times are typically >1 sec per
> DAG. This means a single DAG parse cycle can take several minutes. DAG
> parsing is a large CPU overhead (on a single node cloud VM we've been forced
> to allocate 2 cpu nodes for example). In addition production jobs suffer from
> fairly large lag times between tasks (time between task end and start of
> follow on task). This can be in the order of minutes even when task slots are
> available.
>
> Is anyone working on this enhancement or could provide guidance on resolving
> (possibly a configuration issue our side, but I have experimented with
> configuration options extensively).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)