nickstenning opened a new pull request, #68569:
URL: https://github.com/apache/airflow/pull/68569
We don't need this until someone actually constructs a
`DagFileProcessorManager`, so defer it to reduce the impact of importing this
module.
A test that does `from airflow.dag_processing.manager import DagFileInfo`
pulls in ~20k tracked objects and ~1.2k objects worth of garbage that the
caller never needs.
A minimal harness to show this:
```python
import gc
import time
gc.disable()
t0 = time.monotonic()
from airflow.dag_processing.manager import DagFileInfo
import = time.monotonic() - t0
t0 = time.monotonic()
freed = gc.collect()
collect = time.monotonic() - t0
import_ms = import * 1000
collect_ms = collect * 1000
print(import_ms, collect_ms, len(gc.get_objects()), freed)
```
Running this 10 times with and without this patch shows the extent of the
difference:
# Before
import ms median= 897.5 min= 842.8 max= 925.0
gc.collect ms median= 67.2 min= 57.3 max= 81.2
tracked obj median=261611
freed median= 22152
# After
import ms median= 797.9 min= 774.8 max= 853.0
gc.collect ms median= 54.6 min= 49.1 max= 65.6
tracked obj median=241052
freed median= 21064
# Change
import ms 897.5 -> 797.9 (-11.1%, 1.12x )
gc.collect ms 67.2 -> 54.6 (-18.8%, 1.23x )
tracked obj 261611 -> 241052 ( -7.9%, 1.09x )
freed 22152 -> 21064 ( -4.9%, 1.05x )
<!-- SPDX-License-Identifier: Apache-2.0
https://www.apache.org/licenses/LICENSE-2.0 -->
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]