o-nikolas commented on a change in pull request #16264:
URL: https://github.com/apache/airflow/pull/16264#discussion_r653908548



##########
File path: airflow/providers/microsoft/azure/log/wasb_task_handler.py
##########
@@ -100,8 +100,8 @@ def close(self) -> None:
             with open(local_loc) as logfile:
                 log = logfile.read()
             self.wasb_write(log, remote_loc, append=True)
-
-            if self.delete_local_copy:
+            keep_local = conf.getboolean('logging', 'KEEP_LOCAL_LOGS')
+            if self.delete_local_copy or not keep_local:

Review comment:
       I wonder if `delete_local_copy` is still needed now that you have 
introduced this global behaviour?

##########
File path: airflow/providers/google/cloud/log/gcs_task_handler.py
##########
@@ -132,7 +134,10 @@ def close(self):
             # read log and remove old logs to get just the latest additions
             with open(local_loc) as logfile:
                 log = logfile.read()
-            self.gcs_write(log, remote_loc)
+            success = self.gcs_write(log, remote_loc)
+            keep_local = conf.getboolean('logging', 'KEEP_LOCAL_LOGS')
+            if success and not keep_local:
+                shutil.rmtree(os.path.dirname(local_loc))

Review comment:
       You're implementing the same cleanup recipe several times on the back of 
a global config, both of which are indicators that this is a good candidate for 
logic that should live in a super class. Doing it ad hoc like this leaves us 
open for future developers of remote logging classes to forget or mis-implement 
this logic. The individual remote logging classes should only be responsible 
for doing the upload to their respective service, they shouldn't have to 
re-implement the cleanup.
   
   It is possible to teach `FileTaskHandler` to do this, but it would be tricky 
to make it work in both cases and is a bit smelly. It's maybe time for a new 
super class `RemoteTaskHandler`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to