vandonr-amz opened a new pull request, #30495:
URL: https://github.com/apache/airflow/pull/30495

   Imports are evaluated each time we parse a dag because it's done in a 
separate process, so the modules cache is not shared (and is lost when the 
process is destroyed after having parsed the dag).
   Importing user dependencies in a separate process is good because it 
provides isolation, but airflow dependencies are something we can pre-import to 
gain time.
   By doing it before we fork, we only have to do it once, it's then in the 
cache of all child processes.
   
   I'm proposing this code where I read the python file, extract the imports 
that concern airflow modules, and I import the result using importlib in the 
processor, just before we span the process that's going to execute the dag file.
   
   For simple dags that just define a couple operators and plug them together, 
this showed a ~60% reduction on the time it takes to process a dag file, from 
around 300ms to around 100ms on my machine (ymmv).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to