GitHub user wjddn279 created a discussion: Discussion: Preventing COW in 
LocalExecutor Workers

I investigated the final part of [this 
issue](https://github.com/apache/airflow/issues/56641). [kaxil's 
work](https://github.com/apache/airflow/issues/55768#issuecomment-3461520365) 
resolved the first two problems in version 3.1.2.

However, LocalExecutor still shows continuous memory increase. Using smem, I 
observed that worker processes starting at 20-30MB grow to over 100MB after 1-2 
hours, steadily rising to ~50MB then suddenly jumping to ~100MB ([another 
case](https://github.com/apache/airflow/issues/56641#issuecomment-3479139319))

As [jarek 
mentioned](https://github.com/apache/airflow/issues/56641#issuecomment-3407182357),
 I confirmed this is due to parent process COW by examining worker process 
pmap, which showed increased PSS for memory addresses shared with the parent 
process.

Initially I tried minimizing pre-fork memory allocation, but this was 
ineffective since initialization loads heavy libraries/objects (sqlalchemy, 
fastapi, LazyDeserializedDAG, etc.) consuming at least 150MB.

I found 
[article](https://instagram-engineering.com/copy-on-write-friendly-python-garbage-collection-ad6ed5233ddf),
 which describes gc.freeze as preventing COW by moving objects to permanent 
locations. I confirmed this prevents COW in our workers. (Benchmark results 
will be attached with the PR.)

Before sharing the solution code, I'd like to discuss:

1. Are there side effects of gc.freeze?
The approach is simple and works well for us, but I'd like others' opinions on 
potential issues. This would only apply to LocalExecutor, and by using 
gc.freeze -> worker fork -> gc.unfreeze to release from permanent state, it 
prevents COW without affecting scheduler operations

2. Should we maintain LocalExecutor's current worker creation method?
Current lazy loading would require repeating gc.freeze/unfreeze cycles, 
potentially affecting scheduler performance. Pre-creating workers at 
parallelism level (like v2.x) would be more efficient. Alternatively, we could 
fork a gc.freeze'd snapshot process for worker generation (scheduler -> fork 
(gc freezed snapshot) -> worker), but this adds complexity.

3. Should the Job class be coupled with the executor?
The executor is currently [loaded and defined as a field in the Job 
class](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/jobs/job.py#L162).
 The only class that uses this is the scheduler. It would be more intuitive to 
explicitly separate the executor loading and pass it as an argument to 
SchedulerJobRunner. This would also prevent unnecessary memory forking in the 
LocalExecutor case (even if the COW issue is resolved).

PS. I'd like to document memory management insights from this investigation 
(e.g., which objects are heavy and should be lazy-loaded). Is there existing 
documentation for this?

GitHub link: https://github.com/apache/airflow/discussions/58143

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to