[ 
https://issues.apache.org/jira/browse/AIRFLOW-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16623387#comment-16623387
 ] 

Iuliia Volkova commented on AIRFLOW-128:
----------------------------------------

sorry, I missed )) it's comment for linked task - 
https://issues.apache.org/jira/browse/AIRFLOW-163

> Optimize and refactor process_dag
> ---------------------------------
>
>                 Key: AIRFLOW-128
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-128
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: scheduler
>    Affects Versions: 1.7.1
>            Reporter: Bolke de Bruin
>            Priority: Major
>
> process_dag is currently taskinstance based and programmatically determines 
> which tasks should be part of a "dagrun" (between quotes as it is not a real 
> dagrun). This requires a round trip to the database for every task, easily 
> touching 10-20 per dag per execution_ date every heartbeat or even higher for 
> more complex dags. 
> In addition the session is not reused within process_dag thus for every dag 
> it will open 10-20 sessions per execution_date every heartbeat.
> This is suboptimal. Using dag runs that are instantiated with their 
> associated tasks (see AIRFLOW-124) it can be reduced to one roundtrip per 
> dagrun. Lowering the pressure on the db significantly, in addition if using 
> the database session carefully it can be done within one session further 
> lowering the db pressure and speeding up the scheduler.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to