plockaby commented on a change in pull request #20147:
URL: https://github.com/apache/airflow/pull/20147#discussion_r831766704
##########
File path: airflow/task/task_runner/base_task_runner.py
##########
@@ -57,7 +57,8 @@ def __init__(self, local_task_job):
except AirflowConfigException:
self.run_as_user = None
- self._error_file = NamedTemporaryFile(delete=True)
+ self._error_file = NamedTemporaryFile(delete=False)
Review comment:
I think I am missing something.
Well, there are cases where the NamedTemporaryFile doesn't get cleaned up
automatically, too, like SIGKILL or a SIGTERM. I can't actually figure out how
on_finish gets called so I can't figure out if it would always be called.
So the `StandardTaskRunner` class uses the `run_command` function this base
class. I wrote a program to check file handles.
**lsof.py**
```
import os
import subprocess
def print_open_files(pid):
proc = subprocess.Popen(
["lsof", "-p", str(pid)],
stdout=subprocess.PIPE,
stderr=subprocess.DEVNULL,
universal_newlines=True,
)
print(proc.stdout.read())
def main():
pid = os.getpid()
print(f"open files for {pid}")
print_open_files(pid)
if __name__ == "__main__":
main()
```
**call.py**
```
import os
import subprocess
from tempfile import NamedTemporaryFile
def print_open_files(pid):
proc = subprocess.Popen(
["lsof", "-p", str(pid)],
stdout=subprocess.PIPE,
stderr=subprocess.DEVNULL,
universal_newlines=True,
)
print(proc.stdout.read())
def call_child():
proc = subprocess.Popen(
["python3", "lsof.py"],
stdout=subprocess.PIPE,
stderr=subprocess.DEVNULL,
universal_newlines=True,
close_fds=True,
env=os.environ.copy(),
preexec_fn=os.setsid,
)
print(proc.stdout.read())
def main():
# create a temporary file
pid = os.getpid()
x = NamedTemporaryFile(delete=False)
print(f"created temporary file {x.name} in {pid}")
print(f"open files for {pid}")
print_open_files(pid)
call_child()
x.close()
if __name__ == "__main__":
main()
```
When I tested this code, when the child process is called, it does not have
the file handle, so I'm not sure that it can be depending on this
functionality. But maybe my test is incorrect in some way, which could entirely
be possible, because I am not in any way an expert here on Airflow or forking.
In fact, when I modified my test programs to both write to the temporary file,
the changes made by lsof.py are completely overwritten by call.py when the
close happens after call.py is forked. If I close the file before forking then
the forked program writes to the temporary file then everything works as
expected.
So I'm not an expert on Airflow and I don't know all the intricacies of how
it does all of its stuff. This is just what I think will work to correctly
clean up temp files when impersonation is in use. I also don't have an Airflow
cluster anymore so I can't really test any of this in a production setting.
I'll attempt to write some tests for this, though.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]