mobuchowski opened a new pull request, #41494:
URL: https://github.com/apache/airflow/pull/41494

   Currently `_get_parsed_dag_tree` uses `get_tree_view` which in degenerated 
case (like, in test `test_get_dag_tree_large_dag` can generate string tree 
representation taking multiple gigabytes of memory.
   
   However, for what it's trying to accomplish, pretty much any temporary 
allocations are not necessary. This fix constructs task dependency tree without 
intermediate representation.
   
   Previous memory consumption:
   
   ```
   root@a24bae3584cb:/opt/airflow# pytest --memray 
tests/providers/openlineage/utils/test_utils.py::test_get_dag_tree_large_dag
   
===========================================================================================================================================================================
 test session starts 
============================================================================================================================================================================
   platform linux -- Python 3.12.5, pytest-8.3.2, pluggy-1.5.0 -- 
/usr/local/bin/python
   cachedir: .pytest_cache
   rootdir: /opt/airflow
   configfile: pyproject.toml
   plugins: memray-1.7.0, timeouts-1.2.1, icdiff-0.9, mock-3.14.0, 
rerunfailures-14.0, requests-mock-1.12.1, xdist-3.6.1, asyncio-0.23.8, 
anyio-4.4.0, instafail-0.5.0, cov-5.0.0, time-machine-2.15.0, 
custom-exit-code-0.3.0
   asyncio: mode=Mode.STRICT
   setup timeout: 0.0s, execution timeout: 0.0s, teardown timeout: 0.0s
   collected 1 item
   
   tests/providers/openlineage/utils/test_utils.py::test_get_dag_tree_large_dag 
PASSED                                                                          
                                                                                
                                                                                
                                        [100%]
   
   
   
==============================================================================================================================================================================
 MEMRAY REPORT 
===============================================================================================================================================================================
   Allocation results for 
tests/providers/openlineage/utils/test_utils.py::test_get_dag_tree_large_dag at 
the high watermark
   
         📦 Total memory allocated: 5.4GiB
         📏 Total allocations: 23
         📊 Histogram of allocation sizes: |▁▁█  |
         🥇 Biggest allocating functions:
                - 
_safe_get_dag_tree_view:/opt/airflow/airflow/providers/openlineage/utils/utils.py:446
 -> 2.7GiB
                - get_tree_view:/opt/airflow/airflow/models/dag.py:2445 -> 
2.7GiB
                - __setattr__:/opt/airflow/airflow/models/baseoperator.py:1191 
-> 1.3MiB
                - __setattr__:/opt/airflow/airflow/models/baseoperator.py:1191 
-> 1.3MiB
                - __setattr__:/opt/airflow/airflow/models/baseoperator.py:1191 
-> 1.3MiB
   
   
   
===================================================================================================================================================================
 Warning summary. Total: 3, Unique: 3 
===================================================================================================================================================================
   airflow: total 1, unique 1
     collect: total 1, unique 1
   other: total 2, unique 2
     collect: total 2, unique 2
   Warnings saved into /opt/airflow/tests/warnings.txt file.
   
============================================================================================================================================================================
 1 passed in 8.60s 
=============================================================================================================================================================================
   ```
   
   current memory consumption:
   
   ```
   root@2788b43cb914:/opt/airflow# pytest --memray 
tests/providers/openlineage/utils/test_utils.py::test_get_dag_tree_large_dag
   
===========================================================================================================================================================================
 test session starts 
============================================================================================================================================================================
   platform linux -- Python 3.12.5, pytest-8.3.2, pluggy-1.5.0 -- 
/usr/local/bin/python
   cachedir: .pytest_cache
   rootdir: /opt/airflow
   configfile: pyproject.toml
   plugins: memray-1.7.0, timeouts-1.2.1, icdiff-0.9, mock-3.14.0, 
rerunfailures-14.0, requests-mock-1.12.1, xdist-3.6.1, asyncio-0.23.8, 
anyio-4.4.0, instafail-0.5.0, cov-5.0.0, time-machine-2.15.0, 
custom-exit-code-0.3.0
   asyncio: mode=Mode.STRICT
   setup timeout: 0.0s, execution timeout: 0.0s, teardown timeout: 0.0s
   collected 1 item
   
   tests/providers/openlineage/utils/test_utils.py::test_get_dag_tree_large_dag 
PASSED                                                                          
                                                                                
                                                                                
                                        [100%]
   
   
   
==============================================================================================================================================================================
 MEMRAY REPORT 
===============================================================================================================================================================================
   Allocation results for 
tests/providers/openlineage/utils/test_utils.py::test_get_dag_tree_large_dag at 
the high watermark
   
         📦 Total memory allocated: 16.0MiB
         📏 Total allocations: 30
         📊 Histogram of allocation sizes: |▁█ ▃▇|
         🥇 Biggest allocating functions:
                - __setattr__:/opt/airflow/airflow/models/baseoperator.py:1191 
-> 1.3MiB
                - __setattr__:/opt/airflow/airflow/models/baseoperator.py:1191 
-> 1.3MiB
                - __setattr__:/opt/airflow/airflow/models/baseoperator.py:1191 
-> 1.3MiB
                - __setattr__:/opt/airflow/airflow/models/baseoperator.py:1191 
-> 1.3MiB
                - __setattr__:/opt/airflow/airflow/models/baseoperator.py:1191 
-> 1.3MiB
   
   
   
===================================================================================================================================================================
 Warning summary. Total: 3, Unique: 3 
===================================================================================================================================================================
   airflow: total 1, unique 1
     collect: total 1, unique 1
   other: total 2, unique 2
     collect: total 2, unique 2
   Warnings saved into /opt/airflow/tests/warnings.txt file.
   
============================================================================================================================================================================
 1 passed in 2.49s 
=============================================================================================================================================================================
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to