potiuk edited a comment on issue #22404:
URL: https://github.com/apache/airflow/issues/22404#issuecomment-1075381813
I did some testing and I think it comes from the way we run tasks via fork:
```python
import os
from time import sleep
import tempfile
def test():
tmpdir = tempfile.TemporaryDirectory()
print(f"directory {tmpdir.name} created")
assert os.path.exists(tmpdir.name)
raise Exception("exiting")
pid = os.fork()
if pid:
sleep(2)
else:
try:
test()
finally:
os._exit(0)
```
This is exactly what we do to run the task and the tmp directory is not
deleted as well. When I replace os._exit() with sys.exit(), the directory is
deleted. But using sys.exit() for forked process is wrong and os._exit() is
fine:
Via: https://docs.python.org/3/library/os.html#os._exit
> os._exit(n)
> Exit the process with status n, without calling cleanup handlers, flushing
stdio buffers, etc.
> Note The standard way to exit is sys.exit(n). _exit() should normally only
be used in the child process after a fork().
We are (correctly) exiting forks via os._exit. We cannot do sys.exit()
because we should not run finalizers that have been defined in the parent
process before the fork happened (and running them prematurely might cause some
memory corruption for shared memory from what I understand) . I think there is
not much we can do other than warn users that they shoud not rely on finalizers
in Airflow tasks. I think ther is no easy way to only run post-fork
finalizations.
@ashb - any comments on that?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]