potiuk commented on issue #22404:
URL: https://github.com/apache/airflow/issues/22404#issuecomment-1075381813


   I did some testing and I think it comes from the way we run tasks via fork:
   
   ```python
   import os
   from time import sleep
   import tempfile
   
   
   def test():
       tmpdir = tempfile.TemporaryDirectory()
       print(f"directory {tmpdir.name} created")
       assert os.path.exists(tmpdir.name)
       raise Exception("exiting")
   
   
   pid = os.fork()
   if pid:
       sleep(2)
   else:
   
       try:
           test()
       finally:
           os._exit(0)
   ```
   
   This is exactly what we do to run the task and the tmp directory is not 
deleted as well. When I replace os._exit() with sys.exit(), the directory is 
deleted. But using sys.exit() for forked process is wrong and os._exit() is 
fine:
   
   Via: https://docs.python.org/3/library/os.html#os._exit
   
   > os._exit(n)
   > Exit the process with status n, without calling cleanup handlers, flushing 
stdio buffers, etc.
   > Note The standard way to exit is sys.exit(n). _exit() should normally only 
be used in the child process after a fork().
   
   We are (correctly) exiting forks via os._exit. We cannot do sys.exit() 
because we should not run finalizers that have been defined in the parent 
process before the fork happened (and running them prematurely might cause some 
memory corruption for shared memory from what I understand) . I think there is 
not much we can do other than warn users that they shoud not rely on finalizers 
in Airflow tasks. I think ther is no easy way to only run post-fork 
finalizations.
   
   @ashb - any comments on that?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to