potiuk commented on pull request #19860: URL: https://github.com/apache/airflow/pull/19860#issuecomment-983520739
OK. I think I am closer to understand how reload in "spawned" process can influnce Shared Memory/Resources between the spawned processes an the main one. From https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods > On Unix using the spawn or forkserver start methods will also start a resource tracker process which tracks the unlinked named system resources (such as named semaphores or SharedMemory objects) created by processes of the program. When all processes have exited the resource tracker unlinks any remaining tracked object. Usually there should be none, but if a process was killed by a signal there may be some “leaked” resources. (Neither leaked semaphores nor shared memory segments will be automatically unlinked until the next reboot. This is problematic for both objects because the system allows only a limited number of named semaphores, and shared memory segments occupy some space in the main memory.) This is happening as we can see: ├─1446 (root) [python] 02:26 /usr/local/bin/python -B -c from multiprocessing.resource_tracker import main;main(30) It looks that what happens is that indeed the "spawn" method worked and resource tracker is tracking named resources. This means that some named resources can be shared between the processes (and this is likely what the "DB" drivers do and it basically means that reloading of "airflow.settings" in one process **might** potentially change the state of SQLAlchemy session in the process it was spawned from. What I do not know yet is whe the spawned process was not killed by tearDown in 'spawned" test. but, I think this is not really relevant. I do not believe it was caused by Pytest. I tihnk this IMHO real-life and very dangerous. And it **might** cause a problem in real-live scenarios. This basically means that the behaviour we observe in tests, might happen "in reality". This might modfy/remove the objects that are stored in the session and a number of wrong behaviours of scheduler might occur. Basically the same class of problems we observe in our flaky tests. Extremely difficult to diagnose and debug in real life. I am close to propose that "spawn" method should be deprecated and eventually removed from Airflow. I think this is the best course of action to take IMHO. @ashb @ephraimbuddy @kaxil ? Others - WDYT? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
