Lzzz666 commented on PR #50371: URL: https://github.com/apache/airflow/pull/50371#issuecomment-2999234384
> I ran a benchmark comparing three loading strategies: > > 1. Pre-import before parsing loop > Call `_pre_import_airflow_modules()` once before entering the while loop in _run_parsing_loop(). > 2. Pre-import before fork > Call it right before fork process ( original place ) > 3. No pre-import > > **Setup** > > * 1,000 parse iterations per experiment > * Loaded almost all core Airflow modules plus numpy, pandas, celery, and k8s (no providers, api, www, cli) > * Averaged 1,000 parse (first run excluded) > * Measured > > ```python > process_creation_time = time.monotonic() - process_start_time > ``` > > immediately after `self._start_new_processes()` > > **Results** > > “Before parsing loop” vs “before fork” > > * Moving the pre-import outside the parsing loop reduced `process_creation_time` by **92.8%** compared to doing it before per-fork. > > Baseline vs “before parsing loop” > > * Without pre-import at all was about 29% faster than pre-importing before the loop, but the actual time saved was tiny—probably just normal fork timing noise. > >  > > **Conclusion** > > If my experimental setup is correct, and pre-importing before the parse loop doesn’t introduce any unintended side effects, it might offer an opportunity to improve efficiency. Assuming my experimental setup is correct, should we consider whether to run pre-import prior to the parsing loop? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org