wjddn279 commented on issue #55768: URL: https://github.com/apache/airflow/issues/55768#issuecomment-3374894204
@kaxil I am still in the process of investigating, but to summarize my findings so far: the increase in memory usage mainly comes from the worker processes created inside the scheduler container. The scheduler itself runs as a single process, so its own memory growth is limited and does not contribute significantly. However, since 32 worker processes are forked by default, their combined effect amplifies the overall memory usage roughly by a factor of 32. To investigate further, I used Python’s tracemalloc and modified the Airflow code to trace objects consuming large amounts of memory within the worker processes. [tracemalloc_log.txt](https://github.com/user-attachments/files/22733403/tracemalloc_log.txt) The investigation showed that most of the memory usage originates from library imports. Since each process loads the same libraries independently, the total footprint scales almost linearly with the number of workers. After removing certain heavy imports and rerunning the system, I observed a significant reduction in per-worker memory usage. A container that previously failed within 30 minutes under an 8 GiB memory limit was able to run for over two hours without issues. (This test environment is quite demanding, as 100 DAGs are triggered every minute.) However, the total memory used by all workers still continued to increase over time. <img width="400" height="400" alt="Image" src="https://github.com/user-attachments/assets/566e707b-d66d-4ecb-b389-09cbc522609d" /> As shown in the attached figure, there are large variations in PSS between workers, and eventually all workers cross into higher memory usage regions. My current hypothesis is as follows: - In the LocalExecutor, workers are initially forked from the scheduler process, so they share memory pages through copy-on-write (COW). When additional libraries are imported or specific logic is executed later, those shared pages are duplicated, leading to higher private memory usage. - In addition to the import-related issue, I suspect that scheduler objects inherited by the workers are gradually modified during execution, causing small but steady PSS growth as copy-on-write pages are created. - In other executor types (e.g., CeleryExecutor, KubernetesExecutor), workers are launched as independent processes, so every library import directly adds to total memory usage without COW benefits. My hypothesis is that some heavy libraries are still present, and while they are initially loaded into the scheduler’s shared memory during forking, certain logic causes them to be re-imported later, resulting in private memory allocations per process. I will continue to trace the problematic libraries and will open separate issues as I identify them. Please let me know if you have any questions or different perspectives on my findings so far. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
