plockaby commented on a change in pull request #20147:
URL: https://github.com/apache/airflow/pull/20147#discussion_r831766704



##########
File path: airflow/task/task_runner/base_task_runner.py
##########
@@ -57,7 +57,8 @@ def __init__(self, local_task_job):
             except AirflowConfigException:
                 self.run_as_user = None
 
-        self._error_file = NamedTemporaryFile(delete=True)
+        self._error_file = NamedTemporaryFile(delete=False)

Review comment:
       I think I am missing something.
   
   Well, there are cases where the NamedTemporaryFile doesn't get cleaned up 
automatically, too, like SIGKILL or a SIGTERM. I can't actually figure out how 
on_finish gets called so I can't figure out if it would always be called.
   
   So the `StandardTaskRunner` class uses the `run_command` function this base 
class. I wrote a program to check file handles.
   
   **lsof.py**
   ```
   import os
   import subprocess
   
   
   def print_open_files(pid):
       proc = subprocess.Popen(
           ["lsof", "-p", str(pid)],
           stdout=subprocess.PIPE,
           stderr=subprocess.DEVNULL,
           universal_newlines=True,
       )
       print(proc.stdout.read())
   
   
   def main():
       pid = os.getpid()
       print(f"open files for {pid}")
       print_open_files(pid)
   
   
   if __name__ == "__main__":
       main()
   ```
   
   **call.py**
   ```
   import os
   import subprocess
   from tempfile import NamedTemporaryFile
   
   
   def print_open_files(pid):
       proc = subprocess.Popen(
           ["lsof", "-p", str(pid)],
           stdout=subprocess.PIPE,
           stderr=subprocess.DEVNULL,
           universal_newlines=True,
       )
       print(proc.stdout.read())
   
   
   def call_child():
       proc = subprocess.Popen(
           ["python3", "lsof.py"],
           stdout=subprocess.PIPE,
           stderr=subprocess.DEVNULL,
           universal_newlines=True,
           close_fds=True,
           env=os.environ.copy(),
           preexec_fn=os.setsid,
       )
       print(proc.stdout.read())
   
   
   def main():
       # create a temporary file
       pid = os.getpid()
       x = NamedTemporaryFile(delete=False)
       print(f"created temporary file {x.name} in {pid}")
   
       print(f"open files for {pid}")
       print_open_files(pid)
   
       call_child()
       x.close()
   
   
   if __name__ == "__main__":
       main()
   ```
   
   When I tested this code, when the child process is called, it does not have 
the file handle, so I'm not sure that it can be depending on this 
functionality. But maybe my test is incorrect in some way, which could entirely 
be possible, because I am not in any way an expert here on Airflow or forking. 
In fact, when I modified my test programs to both write to the temporary file, 
the changes made by lsof.py are completely overwritten by call.py when the 
close happens after call.py is forked. If I close the file before forking then 
the forked program writes to the temporary file then everything works as 
expected.
   
   So I'm not an expert on Airflow and I don't know all the intricacies of how 
it does all of its stuff. This is just what I think will work to correctly 
clean up temp files when impersonation is in use. I also don't have an Airflow 
cluster anymore so I can't really test any of this in a production setting. 
I'll attempt to write some tests for this, though.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to