GitHub user wjddn279 created a discussion: Discussion: Preventing COW in LocalExecutor Workers
I investigated the final part of [this issue](https://github.com/apache/airflow/issues/56641). [kaxil's work](https://github.com/apache/airflow/issues/55768#issuecomment-3461520365) resolved the first two problems in version 3.1.2. However, LocalExecutor still shows continuous memory increase. Using smem, I observed that worker processes starting at 20-30MB grow to over 100MB after 1-2 hours, steadily rising to ~50MB then suddenly jumping to ~100MB ([another case](https://github.com/apache/airflow/issues/56641#issuecomment-3479139319)) As [jarek mentioned](https://github.com/apache/airflow/issues/56641#issuecomment-3407182357), I confirmed this is due to parent process COW by examining worker process pmap, which showed increased PSS for memory addresses shared with the parent process. Initially I tried minimizing pre-fork memory allocation, but this was ineffective since initialization loads heavy libraries/objects (sqlalchemy, fastapi, LazyDeserializedDAG, etc.) consuming at least 150MB. I found [article](https://instagram-engineering.com/copy-on-write-friendly-python-garbage-collection-ad6ed5233ddf), which describes gc.freeze as preventing COW by moving objects to permanent locations. I confirmed this prevents COW in our workers. (Benchmark results will be attached with the PR.) Before sharing the solution code, I'd like to discuss: 1. Are there side effects of gc.freeze? The approach is simple and works well for us, but I'd like others' opinions on potential issues. This would only apply to LocalExecutor, and by using gc.freeze -> worker fork -> gc.unfreeze to release from permanent state, it prevents COW without affecting scheduler operations 2. Should we maintain LocalExecutor's current worker creation method? Current lazy loading would require repeating gc.freeze/unfreeze cycles, potentially affecting scheduler performance. Pre-creating workers at parallelism level (like v2.x) would be more efficient. Alternatively, we could fork a gc.freeze'd snapshot process for worker generation (scheduler -> fork (gc freezed snapshot) -> worker), but this adds complexity. 3. Should the Job class be coupled with the executor? The executor is currently [loaded and defined as a field in the Job class](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/jobs/job.py#L162). The only class that uses this is the scheduler. It would be more intuitive to explicitly separate the executor loading and pass it as an argument to SchedulerJobRunner. This would also prevent unnecessary memory forking in the LocalExecutor case (even if the COW issue is resolved). PS. I'd like to document memory management insights from this investigation (e.g., which objects are heavy and should be lazy-loaded). Is there existing documentation for this? GitHub link: https://github.com/apache/airflow/discussions/58143 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
