coufon commented on issue #5594: [AIRFLOW-4924] Loading DAGs asynchronously in 
Airflow webserver
URL: https://github.com/apache/airflow/pull/5594#issuecomment-511892627
 
 
   > I like the idea. As Ash Berlin-Taylor mentioned in JIRA - it likely won't 
be needed in this form when we implement persisting the DAGs /stateless 
webserver. However it sounds it can be nice as an intermediate solution (and 
rather smallish incremental change as opposed to big structural change in 
Airflow itself) until we got all details worked out for those. And can be even 
cherry-pickable if we ever attempt to release 1.10.5.
   > 
   > Sounds like the idea of stringifying the DAGs is interesting and might be 
used as a starting point to implement part of DAG persistence (no matter how it 
will be implemented eventually). It's only for DAG Python code text and 
'structure' of course - there is no way this can be used to actually execute 
the DAGs, but it serves well the purpose that you want fast loading of many 
DAGs as Python objects in cases where you have to have many DAG objects in the 
same process (UI/scheduler). But I like the idea to have "stringified" and 
"real" version of DAGs - one for structure/code and one for actual execution. 
Sounds like an interesting optimisation which is pretty independent from any 
other Airflow features.
   > 
   > And maybe we can use the benefit that this solution is available in 
Composer already (as alpha) and can be tested on a wide variety of 
configurations (especially that it is aimed for deployments with big number of 
DAGs and I assume customers will only enable it when they have big number of 
DAGs with complex structure). It's very valuable for Airflow community to get 
code that have been battle-tested already. Zhou Fang -> maybe you can share 
your experiences with an actual "production" usage of this?
   > 
   > What I do not fully understand yet about the current implementation is 
casting to BaseOperator for non-airflow modules. Zhou Fang - can you maybe 
explain a bit why this is needed ?
   > 
   > I understand that the stringified Dags must be picklable to send over 
multiprocessing Queue and then for the UI to create the objects and be able to 
show the structure and code. Same in scheduler - we only want to use the Dag 
code to schedule it. Maybe I am missing something - but I am not sure why this 
BaseOperator casting is needed in this case as long as all the custom classes 
are also loaded in webserver/scheduler.
   
   Thanks Jarek for the comments. 
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to