wjddn279 commented on issue #55768:
URL: https://github.com/apache/airflow/issues/55768#issuecomment-3374894204

   @kaxil 
   
   I am still in the process of investigating, but to summarize my findings so 
far:
   the increase in memory usage mainly comes from the worker processes created 
inside the scheduler container.
   
   The scheduler itself runs as a single process, so its own memory growth is 
limited and does not contribute significantly. However, since 32 worker 
processes are forked by default, their combined effect amplifies the overall 
memory usage roughly by a factor of 32.
   
   To investigate further, I used Python’s tracemalloc and modified the Airflow 
code to trace objects consuming large amounts of memory within the worker 
processes.
   
   
[tracemalloc_log.txt](https://github.com/user-attachments/files/22733403/tracemalloc_log.txt)
   
   The investigation showed that most of the memory usage originates from 
library imports. Since each process loads the same libraries independently, the 
total footprint scales almost linearly with the number of workers. 
   
   After removing certain heavy imports and rerunning the system, I observed a 
significant reduction in per-worker memory usage. A container that previously 
failed within 30 minutes under an 8 GiB memory limit was able to run for over 
two hours without issues. (This test environment is quite demanding, as 100 
DAGs are triggered every minute.)
   
   However, the total memory used by all workers still continued to increase 
over time.
   <img width="400" height="400" alt="Image" 
src="https://github.com/user-attachments/assets/566e707b-d66d-4ecb-b389-09cbc522609d";
 />
   As shown in the attached figure, there are large variations in PSS between 
workers, and eventually all workers cross into higher memory usage regions. My 
current hypothesis is as follows:
   - In the LocalExecutor, workers are initially forked from the scheduler 
process, so they share memory pages through copy-on-write (COW). When 
additional libraries are imported or specific logic is executed later, those 
shared pages are duplicated, leading to higher private memory usage.
   - In addition to the import-related issue, I suspect that scheduler objects 
inherited by the workers are gradually modified during execution, causing small 
but steady PSS growth as copy-on-write pages are created.
   - In other executor types (e.g., CeleryExecutor, KubernetesExecutor), 
workers are launched as independent processes, so every library import directly 
adds to total memory usage without COW benefits.
   
   
   My hypothesis is that some heavy libraries are still present, and while they 
are initially loaded into the scheduler’s shared memory during forking, certain 
logic causes them to be re-imported later, resulting in private memory 
allocations per process.
   
   I will continue to trace the problematic libraries and will open separate 
issues as I identify them.
   Please let me know if you have any questions or different perspectives on my 
findings so far.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to