[
https://issues.apache.org/jira/browse/AIRFLOW-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16623387#comment-16623387
]
Iuliia Volkova commented on AIRFLOW-128:
----------------------------------------
sorry, I missed )) it's comment for linked task -
https://issues.apache.org/jira/browse/AIRFLOW-163
> Optimize and refactor process_dag
> ---------------------------------
>
> Key: AIRFLOW-128
> URL: https://issues.apache.org/jira/browse/AIRFLOW-128
> Project: Apache Airflow
> Issue Type: Improvement
> Components: scheduler
> Affects Versions: 1.7.1
> Reporter: Bolke de Bruin
> Priority: Major
>
> process_dag is currently taskinstance based and programmatically determines
> which tasks should be part of a "dagrun" (between quotes as it is not a real
> dagrun). This requires a round trip to the database for every task, easily
> touching 10-20 per dag per execution_ date every heartbeat or even higher for
> more complex dags.
> In addition the session is not reused within process_dag thus for every dag
> it will open 10-20 sessions per execution_date every heartbeat.
> This is suboptimal. Using dag runs that are instantiated with their
> associated tasks (see AIRFLOW-124) it can be reduced to one roundtrip per
> dagrun. Lowering the pressure on the db significantly, in addition if using
> the database session carefully it can be done within one session further
> lowering the db pressure and speeding up the scheduler.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)