wjddn279 opened a new pull request, #60505: URL: https://github.com/apache/airflow/pull/60505
## Motivation discussed: https://lists.apache.org/thread/33hdp3hm705mzgrltv7o3468wvwbjsr3 closed: https://github.com/apache/airflow/issues/56879 ## Insights ### trying to apply gc.freeze / unfreeze cycle First, to apply it in the same way as implemented in LocalExecutor, I perform gc.freeze and gc.unfreeze immediately before and after forking: ```python os.register_at_fork(before=gc.freeze, after_in_parent=gc.unfreeze) ``` However, after applying this, memory inspection revealed excessive memory leaks. <img width="500" height="150" alt="image" src="https://github.com/user-attachments/assets/00049743-73cf-4b66-a45c-57a738e018f9" /> This is the existing memory graph pattern. <img width="500" height="150" alt="image" src="https://github.com/user-attachments/assets/492b39d2-e325-4829-a2fb-f6c8b7f45522" /> Looking at the graph shape, you can see heap memory dropping at specific intervals, which appears to be a typical pattern of old gc, so I inferred there might be a connection. I believe objects that should be cleaned up when old gc (generation 2 gc) occurs are frozen and thus escape gc, continuing to accumulate. As shown below, if you forcibly collect gc before freezing or reduce the generation 2 gc threshold to an extreme low, memory doesn't increase: ```python def freeze(): gc.collect() gc.freeze() os.register_at_fork(before=freeze, after_in_parent=gc.unfreeze) ``` or ```python gc.set_threshold(700, 10, 1) ``` However, I judged that forcibly changing the gc flow would have very significant side effects, so I didn't apply this cycle. ### apply it before parsing start Instead, I inferred that simply freezing existing objects would be sufficient to help prevent COW. There was a debate in the Python community about gc.freeze, and the main points are as follows: https://discuss.python.org/t/it-seems-to-me-that-gc-freeze-is-pointless-and-the-documentation-misleading/71775 - Even with gc.freeze applied, COW occurs when the ref count of actual objects changes. - Therefore, even if frozen, COW occurs for objects actually used in the fork process. - However, the COW prevention effect for unused objects is clear. Since Airflow loads the same modules for all components and much of it goes unused, I judged that simply freezing these would be sufficient to prevent COW, and I froze objects created before the dag parsing loop runs. ## Performance I deployed both the existing 3.1.5 version image and an image with gc.freeze applied to k8s. I deployed the same plugins and dags to the dag-processor. The parsing stats are as follows (dag name is masked): ``` Bundle File Path PID Current Duration # DAGs # Errors Last Duration Last Run At ----------- -------------------------------------- ----- ------------------ -------- ---------- --------------- ------------------- dags-folder **/dynamic_dags_**_****.py 54374 7.01s 47 0 27.83s 2026-01-14T08:21:15 dags-folder **/dynamic_dags_**_*******.py 54 0 20.90s 2026-01-14T08:21:41 dags-folder **/dynamic_dags_**.py 54325 8.04s 38 0 19.06s 2026-01-14T08:21:13 dags-folder **/dynamic_dags_**_*******_**_*.py 13 0 18.43s 2026-01-14T08:21:35 dags-folder **/dynamic_dags_**_*******_**_*.py 5 0 4.64s 2026-01-14T08:21:19 ``` After monitoring memory usage for about two days, the results are as follows (x axis is time with KST): <img width="1618" height="447" alt="image" src="https://github.com/user-attachments/assets/176bf3fc-70c3-4bff-8a9e-a48178d2e073" /> <img width="1629" height="48" alt="스크린샷 2026-01-14 오후 5 52 42" src="https://github.com/user-attachments/assets/44e289f2-fb4f-40b1-a60e-8f27ceb43f46" /> I confirmed that the overall average memory usage is lower with gc.freeze, and the memory peak is also lower in the applied version. Looking broadly, both show a slight upward trend in memory usage, which I judge is ultimately a problem that needs to be resolved. <!-- Thank you for contributing! Please provide above a brief description of the changes made in this pull request. Write a good git commit message following this guide: http://chris.beams.io/posts/git-commit/ Please make sure that your code changes are covered with tests. And in case of new features or big changes remember to adjust the documentation. Feel free to ping (in general) for the review if you do not see reaction for a few days (72 Hours is the minimum reaction time you can expect from volunteers) - we sometimes miss notifications. In case of an existing issue, reference it using one of the following: * closes: #ISSUE * related: #ISSUE --> ## Was generative AI tooling used to co-author this PR? <!-- If generative AI tooling has been used in the process of authoring this PR, please change below checkbox to `[X]` followed by the name of the tool, uncomment the "Generated-by". --> - [ ] Yes (please specify the tool below) <!-- Generated-by: [Tool Name] following [the guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#gen-ai-assisted-contributions) --> --- * Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#pull-request-guidelines)** for more information. Note: commit author/co-author name and email in commits become permanently public when merged. * For fundamental code changes, an Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvement+Proposals)) is needed. * When adding dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). * For significant user-facing changes create newsfragment: `{pr_number}.significant.rst` or `{issue_number}.significant.rst`, in [airflow-core/newsfragments](https://github.com/apache/airflow/tree/main/airflow-core/newsfragments). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
