vandonr-amz opened a new pull request, #30495: URL: https://github.com/apache/airflow/pull/30495
Imports are evaluated each time we parse a dag because it's done in a separate process, so the modules cache is not shared (and is lost when the process is destroyed after having parsed the dag). Importing user dependencies in a separate process is good because it provides isolation, but airflow dependencies are something we can pre-import to gain time. By doing it before we fork, we only have to do it once, it's then in the cache of all child processes. I'm proposing this code where I read the python file, extract the imports that concern airflow modules, and I import the result using importlib in the processor, just before we span the process that's going to execute the dag file. For simple dags that just define a couple operators and plug them together, this showed a ~60% reduction on the time it takes to process a dag file, from around 300ms to around 100ms on my machine (ymmv). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
