[ 
https://issues.apache.org/jira/browse/AIRFLOW-2762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16547113#comment-16547113
 ] 

Kevin Yang edited comment on AIRFLOW-2762 at 7/17/18 9:13 PM:
--------------------------------------------------------------

[~ashb] Ty a lot for providing your opinions. I think that is good idea, since 
it will also provide some sort of consistency between scheduler and webserver. 
Though to be able to do that, we need to store more info in the DagModel that 
webserver needs, e.g. the dependency. I am also not very sure about how much 
extra load that would place on the DB. I think if we go this route, we might 
want to build a DAG parsing component that parses DAG for both scheduler and 
webserver. I think before we decided to do that, we can try parallelize the 
parsing on webserver--the work can be reused when we have the DAG parsing 
service since the webserver will be using the serializable info of the DAG 
instead of the the DAG object in both cases. 


was (Author: yrqls21):
[~ashb] Ty for the opinions. I think that is good idea, since it will also 
provide some sort of consistency between scheduler and webserver. Though to be 
able to do that, we need to store more info in the DagModel that webserver 
needs, e.g. the dependency. I am also not very sure about how much extra load 
that would place on the DB. I think if we go this route, we might want to build 
a DAG parsing component that parses DAG for both scheduler and webserver. I 
think before we decided to do that, we can try parallelize the parsing on 
webserver--the work can be reused when we have the DAG parsing service since 
the webserver will be using the serializable info of the DAG instead of the the 
DAG object in both cases. 

> Parallelize DAG parsing in webserver
> ------------------------------------
>
>                 Key: AIRFLOW-2762
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2762
>             Project: Apache Airflow
>          Issue Type: Improvement
>            Reporter: Kevin Yang
>            Priority: Major
>
> Currently the webserver parses DagBag in a single thread fashion and causes 
> the start up time to be slow when we have large # of DAG files. Webservers 
> should not need the actual DAG object and this should be parallelized.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to