potiuk commented on issue #56641:
URL: https://github.com/apache/airflow/issues/56641#issuecomment-3407182357

   > [@potiuk](https://github.com/potiuk)
   > 
   > Thank you for your response. I’m planning to create a PR for this work, 
but are there any matters that need further discussion before doing so?
   
   It looks very clear from your investigation, so any details on how to handle 
this further can be discussed in the PR.
   
   Just one comment on the last point: I guess the root cause of the problem is 
the way how Python reference counting works. The increased memory consumption 
is likely indeed caused by copy-on-write and it might be a by-product of some 
of the pre-fork initialization. In case of C/low-level programs, forking and 
copy-on-write works really well, but for Python, any access of existing objects 
from separate thread or passing the object to another "user", increases the 
reference counter of the object and since reference count is in the same memory 
as the object itself, the problem is that copy-on-write might execute not only 
when objects is modified, but when it is just used. 
   
   For example this might happen when objects are garbage collected (which is 
likely to happen when you fork after initializing many of them. in this case - 
as surprising as it is - just garbage-collecting objects might **increase** 
memory used by the forked process because garbage collected objects might share 
memory pages with other objects and the whole page in this case needs to be 
copied.
   
   So what you observed is quite likely that some objects get initialized and 
then a lot of memory might be copied-on-write even if single objects are 
accessed (copy-on-write is done per page - and page can contain many objects.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to