potiuk commented on PR #39484:
URL: https://github.com/apache/airflow/pull/39484#issuecomment-2100687710

   This is a very risky change if you have not thoroughly tested various 
configuration and I think it's for the worse on production systems. When you 
are using it on MacOS, yes it will be a bit slow because default 
multiprocessing context on MacOS is `spawn` 
https://docs.python.org/3/library/multiprocessing.html#multiprocessing-start-methods
 but on linux it is `fork` and this implementation is optimized for production 
case where Airflow is run on Linux.
   
   Have you considered impact on production for this change? 
   
   The overhead for production (Linux and system where fork is supported) is 
minimal, because `fork` model is used for the processes. Forking a process that 
does very little is very inexpensive in this model, because the heap memory is 
in "copy on write" mode, which means that forking is nearly instanteneous and 
until some memory is modiified, no extra memory is used.
   
   On the other hand, the change you propose will impact memory used in a long 
term as the long running processes might accumulate memory when run for a long 
time and run different callables. So it might cause memory leak, not mentioning 
that in the original model the forked processes are only needed for a very 
short time - to send tasks to celery and for example they are not consuming any 
resources when there are no tasks to run (because they are stopped right after 
they complete,
   
   Have you tested it beyond MacOS to see what kind of performance gains you 
have and what impact on memory usage and resource usage at scale? 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to