Lzzz666 commented on PR #50371:
URL: https://github.com/apache/airflow/pull/50371#issuecomment-2975058197

   I found that with the current pre_import implementation (#30495), the 
performance gains are negligible—probably because it only pre-imports the 
Airflow modules actually used in each DAG. In theory this should still speed 
things up, but my benchmarks didn’t show any improvement. However, when I 
modified the pre_import function to preload only the “heavier” third-party 
libraries (NumPy, pandas, the Kubernetes, and Celery), the speed-up became very 
noticeable.
   
   All of my tests involved parsing the same DAG ten times and measuring 
`run_duration` (which is defined as `run_duration = time.monotonic() - 
proc.start_time`).
   
   1. First test with origin  pre-imports method
   
![image](https://github.com/user-attachments/assets/42aa4256-58ec-4304-878d-06f61ccf0fee)
   
   2. Second test with origin  pre-imports method
   
![image](https://github.com/user-attachments/assets/60935221-9fdc-4948-993c-d5c5c4a75b24)
   
   3. Modify pre-import to only include  NumPy, pandas, Kubernetes, celery
   
![image](https://github.com/user-attachments/assets/d2cd44a8-0601-41b4-ba29-2daab6ac96ce)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to