potiuk commented on pull request #19860:
URL: https://github.com/apache/airflow/pull/19860#issuecomment-983520739


   OK. I think I am closer to understand how reload in "spawned" process can 
influnce Shared Memory/Resources between the spawned processes an the main one. 
From 
https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods
   
   > On Unix using the spawn or forkserver start methods will also start a 
resource tracker process which tracks the unlinked named system resources (such 
as named semaphores or SharedMemory objects) created by processes of the 
program. When all processes have exited the resource tracker unlinks any 
remaining tracked object. Usually there should be none, but if a process was 
killed by a signal there may be some “leaked” resources. (Neither leaked 
semaphores nor shared memory segments will be automatically unlinked until the 
next reboot. This is problematic for both objects because the system allows 
only a limited number of named semaphores, and shared memory segments occupy 
some space in the main memory.)
   
   This is happening as we can see:
   
   ├─1446  (root) [python] 02:26 /usr/local/bin/python -B -c from 
multiprocessing.resource_tracker import main;main(30)
   
   It looks that what happens is that indeed the "spawn" method worked and 
resource tracker is tracking named resources.  This means that some named 
resources can be shared between the processes (and this is likely what the "DB" 
drivers do and it basically means that reloading of "airflow.settings" in one 
process **might** potentially change the state of SQLAlchemy session in the 
process it was spawned from.
   
   What I do not know yet is whe the spawned process was not killed by tearDown 
in 'spawned" test. but, I think this is not really relevant.
   
   I do not believe it was caused by Pytest. I tihnk this IMHO real-life and 
very dangerous. And it **might** cause a problem in real-live scenarios. This 
basically means that the behaviour we observe in tests, might happen "in 
reality". This might modfy/remove the objects that are stored in the session 
and a number of wrong behaviours of scheduler might occur. Basically the same 
class of problems we observe in our flaky tests. Extremely difficult to 
diagnose and debug in real life.
   
   I am close to propose that "spawn" method should be deprecated and 
eventually removed from Airflow. I think this is the best course of action to 
take IMHO. 
   
   @ashb @ephraimbuddy @kaxil ? Others - WDYT? 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to