Lzzz666 commented on PR #50371:
URL: https://github.com/apache/airflow/pull/50371#issuecomment-2979352249

   I ran a benchmark comparing three loading strategies:
   
   1. Pre-import before parsing loop
   Call `_pre_import_airflow_modules()` once before entering the while loop in 
_run_parsing_loop().
   2. Pre-import before fork
   Call it right before fork process
   3. No pre-import
   
   **Setup**
   
   * 1,000 parse iterations per experiment
   * Loaded almost all core Airflow modules plus numpy, pandas, celery, and k8s 
(no providers, api, www, cli)
   * Averaged 1,000 parse (first run excluded)
   * Measured
   ```python
   process_creation_time = time.monotonic() - process_start_time
   ```
   immediately after `self._start_new_processes()`
   
   **Results**
   
   “Before parsing loop” vs “before fork”
   
   * Moving the pre-import outside the parsing loop reduced 
`process_creation_time` by **92.8%** compared to doing it before per-fork.
   
   Baseline vs “before parsing loop”
   
   * Without pre-import at all was about 29% faster than pre-importing before 
the loop, but the actual time saved was tiny—probably just normal fork timing 
noise.
   
   
![image](https://github.com/user-attachments/assets/f02a4f07-9eb6-4f38-a932-245c2e7df23e)
   
   **Conclusion**
   
   If my experimental setup is correct, and pre-importing before the parse loop 
doesn’t introduce any unintended side effects, it might offer an opportunity to 
improve efficiency.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to