Re: [PR] Add back dag parsing pre-import optimization [airflow]

via GitHub Tue, 24 Jun 2025 03:11:56 -0700


Lzzz666 commented on PR #50371:
URL: https://github.com/apache/airflow/pull/50371#issuecomment-2999234384


   > I ran a benchmark comparing three loading strategies:
   > 
   > 1. Pre-import before parsing loop
   >    Call `_pre_import_airflow_modules()` once before entering the while 
loop in _run_parsing_loop().
   > 2. Pre-import before fork
   >    Call it right before fork process ( original place )
   > 3. No pre-import
   > 
   > **Setup**
   > 
   > * 1,000 parse iterations per experiment
   > * Loaded almost all core Airflow modules plus numpy, pandas, celery, and 
k8s (no providers, api, www, cli)
   > * Averaged 1,000 parse (first run excluded)
   > * Measured
   > 
   > ```python
   > process_creation_time = time.monotonic() - process_start_time
   > ```
   > 
   > immediately after `self._start_new_processes()`
   > 
   > **Results**
   > 
   > “Before parsing loop” vs “before fork”
   > 
   > * Moving the pre-import outside the parsing loop reduced 
`process_creation_time` by **92.8%** compared to doing it before per-fork.
   > 
   > Baseline vs “before parsing loop”
   > 
   > * Without pre-import at all was about 29% faster than pre-importing before 
the loop, but the actual time saved was tiny—probably just normal fork timing 
noise.
   > 
   > 
![image](https://private-user-images.githubusercontent.com/58145495/455859558-f02a4f07-9eb6-4f38-a932-245c2e7df23e.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NTA3NTExMTQsIm5iZiI6MTc1MDc1MDgxNCwicGF0aCI6Ii81ODE0NTQ5NS80NTU4NTk1NTgtZjAyYTRmMDctOWViNi00ZjM4LWE5MzItMjQ1YzJlN2RmMjNlLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTA2MjQlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwNjI0VDA3NDAxNFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWE4ZGQ4MjM3NmFmZGJmMWFjNTk1YmVlMTBkN2YzMDFhNTUyMGE2YzhmYzY2MjYwMWUwMDBiZGY2YWY2ZDhkNTYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.pVzkSriYUo6c7oTzPjK0a3ChNpIOSqkn5MFOMFF19io)
   > 
   > **Conclusion**
   > 
   > If my experimental setup is correct, and pre-importing before the parse 
loop doesn’t introduce any unintended side effects, it might offer an 
opportunity to improve efficiency.
   
   Assuming my experimental setup is correct,  should we consider whether to 
run pre-import prior to the parsing loop?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] Add back dag parsing pre-import optimization [airflow]

Reply via email to