Lzzz666 commented on PR #50371: URL: https://github.com/apache/airflow/pull/50371#issuecomment-2975058197
I found that with the current pre_import implementation (#30495), the performance gains are negligible—probably because it only pre-imports the Airflow modules actually used in each DAG. In theory this should still speed things up, but my benchmarks didn’t show any improvement. However, when I modified the pre_import function to preload only the “heavier” third-party libraries (NumPy, pandas, the Kubernetes, and Celery), the speed-up became very noticeable. All of my tests involved parsing the same DAG ten times and measuring `run_duration` (which is defined as `run_duration = time.monotonic() - proc.start_time`). 1. First test with origin pre-imports method  2. Second test with origin pre-imports method  3. Modify pre-import to only include NumPy, pandas, Kubernetes, celery  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
