potiuk commented on code in PR #58365:
URL: https://github.com/apache/airflow/pull/58365#discussion_r2531951331
##########
airflow-core/src/airflow/executors/local_executor.py:
##########
@@ -186,9 +193,14 @@ def _check_workers(self):
# via `sync()` a few times before the spawned process actually starts
picking up messages. Try not to
# create too much
if num_outstanding and len(self.workers) < self.parallelism:
- # This only creates one worker, which is fine as we call this
directly after putting a message on
- # activity_queue in execute_async
- self._spawn_worker()
+ if self.is_mp_using_fork:
+ # This creates the maximum number of worker processes at once
Review Comment:
I see why - and it's a bit of a trade-off - we are immediately spawning the
maximum number of local executors here even if we might not need them.
@ashb - you were likely more around when the decision was made about
spawning one worker at a time before (which was understandable) - but with the
memory leak with COW, I have a feeling that this one will be way more efficient
and memory hungry even if we fork **immediately** all workers - because
essentially with gc_freeze and COW avoiding, those new workers will **almost**
not take any aditional memory - not until they actually start processing tasks.
So the overhead is mostly on having additional process, opened file descriptors
and the like - but that's not **much** and when you have even slightest spike
in processing tasks, this will happen anyway.
I can imagine - before the gc_freeze - that there we likely wanted to avoid
starting the processes, because due to almost immediate COW due to garbage
collection running in the background, a lot of memory was allocated very
quickly, but I believe in this case, the overhead for starting all workers
immediately is quite small and the fact that we avoid multiple freeze/unfreeze
cycles will make it far more efficient.
WDYT?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]