Gregory Benison created AIRFLOW-1614:
----------------------------------------

             Summary: Improve performance of DAG parsing when there are many 
subdags
                 Key: AIRFLOW-1614
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1614
             Project: Apache Airflow
          Issue Type: Improvement
            Reporter: Gregory Benison


DAGs can be very slow to parse when they contain many (100s or 1000s) of 
subdags.  This can be illustrated using the following DAG definition file:
{code}from datetime import datetime, timedelta
from airflow.models import DAG
from airflow.operators.dummy_operator import DummyOperator
from airflow.operators.subdag_operator import SubDagOperator

dag = DAG(
    'subdaggy-2',
    schedule_interval=None,
    start_date=datetime(2017,1,1)
)

def make_sub_dag(parent_dag, N):
    dag = DAG(
        '%s.task_%d' % (parent_dag.dag_id, N),
        schedule_interval=parent_dag.schedule_interval,
        start_date=parent_dag.start_date
        )
    DummyOperator(task_id='task1', dag=dag) >> DummyOperator(task_id='task2', 
dag=dag)
    return dag

downstream_task = DummyOperator(task_id='downstream', dag=dag)
for N in range(20):
    SubDagOperator(
        dag=dag,
        task_id='task_%d' % N,
        subdag=make_sub_dag(dag, N)
        ) >> downstream_task
{code}

When there are more than 50 or so subdags this file becomes slow enough to 
parse that it fails to load in the web UI on a modest platform such as a laptop.

It would be nice to support such DAGs, since there are useful workflows 
involving 100s or 1000s of subdags.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to