potiuk commented on code in PR #58365:
URL: https://github.com/apache/airflow/pull/58365#discussion_r2531951331


##########
airflow-core/src/airflow/executors/local_executor.py:
##########
@@ -186,9 +193,14 @@ def _check_workers(self):
         # via `sync()` a few times before the spawned process actually starts 
picking up messages. Try not to
         # create too much
         if num_outstanding and len(self.workers) < self.parallelism:
-            # This only creates one worker, which is fine as we call this 
directly after putting a message on
-            # activity_queue in execute_async
-            self._spawn_worker()
+            if self.is_mp_using_fork:
+                # This creates the maximum number of worker processes at once

Review Comment:
   I see why - and it's a bit of a trade-off - we are immediately spawning the 
maximum number of local executors here even if we might not need them. 
   
   @ashb - you were likely more around when the decision was made about 
spawning one worker at a time before (which was understandable) - but with the 
memory leak with COW, I have a feeling that this one will be way more efficient 
and memory hungry even if we fork **immediately** all workers - because 
essentially with gc_freeze and COW avoiding, those new workers will **almost** 
not take any aditional memory - not until they actually start processing tasks. 
So the overhead is mostly on having additional process, opened file descriptors 
and the like - but that's not **much** and when you have even slightest spike 
in processing tasks, this will happen anyway.
   
   I can imagine - before the gc_freeze - that there we likely wanted to avoid 
starting the processes, because due to almost immediate COW due to garbage 
collection running in the background, a lot of memory was allocated very 
quickly, but I believe in this case, the overhead for starting all workers 
immediately is quite small and the fact that we avoid multiple freeze/unfreeze 
cycles will make it far more efficient.
   
   WDYT? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to