mobuchowski opened a new pull request, #41494:
URL: https://github.com/apache/airflow/pull/41494
Currently `_get_parsed_dag_tree` uses `get_tree_view` which in degenerated
case (like, in test `test_get_dag_tree_large_dag` can generate string tree
representation taking multiple gigabytes of memory.
However, for what it's trying to accomplish, pretty much any temporary
allocations are not necessary. This fix constructs task dependency tree without
intermediate representation.
Previous memory consumption:
```
root@a24bae3584cb:/opt/airflow# pytest --memray
tests/providers/openlineage/utils/test_utils.py::test_get_dag_tree_large_dag
===========================================================================================================================================================================
test session starts
============================================================================================================================================================================
platform linux -- Python 3.12.5, pytest-8.3.2, pluggy-1.5.0 --
/usr/local/bin/python
cachedir: .pytest_cache
rootdir: /opt/airflow
configfile: pyproject.toml
plugins: memray-1.7.0, timeouts-1.2.1, icdiff-0.9, mock-3.14.0,
rerunfailures-14.0, requests-mock-1.12.1, xdist-3.6.1, asyncio-0.23.8,
anyio-4.4.0, instafail-0.5.0, cov-5.0.0, time-machine-2.15.0,
custom-exit-code-0.3.0
asyncio: mode=Mode.STRICT
setup timeout: 0.0s, execution timeout: 0.0s, teardown timeout: 0.0s
collected 1 item
tests/providers/openlineage/utils/test_utils.py::test_get_dag_tree_large_dag
PASSED
[100%]
==============================================================================================================================================================================
MEMRAY REPORT
===============================================================================================================================================================================
Allocation results for
tests/providers/openlineage/utils/test_utils.py::test_get_dag_tree_large_dag at
the high watermark
📦 Total memory allocated: 5.4GiB
📏 Total allocations: 23
📊 Histogram of allocation sizes: |▁▁█ |
🥇 Biggest allocating functions:
-
_safe_get_dag_tree_view:/opt/airflow/airflow/providers/openlineage/utils/utils.py:446
-> 2.7GiB
- get_tree_view:/opt/airflow/airflow/models/dag.py:2445 ->
2.7GiB
- __setattr__:/opt/airflow/airflow/models/baseoperator.py:1191
-> 1.3MiB
- __setattr__:/opt/airflow/airflow/models/baseoperator.py:1191
-> 1.3MiB
- __setattr__:/opt/airflow/airflow/models/baseoperator.py:1191
-> 1.3MiB
===================================================================================================================================================================
Warning summary. Total: 3, Unique: 3
===================================================================================================================================================================
airflow: total 1, unique 1
collect: total 1, unique 1
other: total 2, unique 2
collect: total 2, unique 2
Warnings saved into /opt/airflow/tests/warnings.txt file.
============================================================================================================================================================================
1 passed in 8.60s
=============================================================================================================================================================================
```
current memory consumption:
```
root@2788b43cb914:/opt/airflow# pytest --memray
tests/providers/openlineage/utils/test_utils.py::test_get_dag_tree_large_dag
===========================================================================================================================================================================
test session starts
============================================================================================================================================================================
platform linux -- Python 3.12.5, pytest-8.3.2, pluggy-1.5.0 --
/usr/local/bin/python
cachedir: .pytest_cache
rootdir: /opt/airflow
configfile: pyproject.toml
plugins: memray-1.7.0, timeouts-1.2.1, icdiff-0.9, mock-3.14.0,
rerunfailures-14.0, requests-mock-1.12.1, xdist-3.6.1, asyncio-0.23.8,
anyio-4.4.0, instafail-0.5.0, cov-5.0.0, time-machine-2.15.0,
custom-exit-code-0.3.0
asyncio: mode=Mode.STRICT
setup timeout: 0.0s, execution timeout: 0.0s, teardown timeout: 0.0s
collected 1 item
tests/providers/openlineage/utils/test_utils.py::test_get_dag_tree_large_dag
PASSED
[100%]
==============================================================================================================================================================================
MEMRAY REPORT
===============================================================================================================================================================================
Allocation results for
tests/providers/openlineage/utils/test_utils.py::test_get_dag_tree_large_dag at
the high watermark
📦 Total memory allocated: 16.0MiB
📏 Total allocations: 30
📊 Histogram of allocation sizes: |▁█ ▃▇|
🥇 Biggest allocating functions:
- __setattr__:/opt/airflow/airflow/models/baseoperator.py:1191
-> 1.3MiB
- __setattr__:/opt/airflow/airflow/models/baseoperator.py:1191
-> 1.3MiB
- __setattr__:/opt/airflow/airflow/models/baseoperator.py:1191
-> 1.3MiB
- __setattr__:/opt/airflow/airflow/models/baseoperator.py:1191
-> 1.3MiB
- __setattr__:/opt/airflow/airflow/models/baseoperator.py:1191
-> 1.3MiB
===================================================================================================================================================================
Warning summary. Total: 3, Unique: 3
===================================================================================================================================================================
airflow: total 1, unique 1
collect: total 1, unique 1
other: total 2, unique 2
collect: total 2, unique 2
Warnings saved into /opt/airflow/tests/warnings.txt file.
============================================================================================================================================================================
1 passed in 2.49s
=============================================================================================================================================================================
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]