serrovsky opened a new pull request, #31955: URL: https://github.com/apache/airflow/pull/31955
This PR tries to fix a problem that I had some weeks ago. If you are storing your logs locally, and since the logs folders tree structure by default follows the pattern <dag_id>/<run_id>/<task_id>/<attempt_id>.log this scales quickly. After some time you got thousands of empty folders and the clean-logs.sh starts to consume a lot of memory due to the find command.  Where you can find an example of what was happening in our case. Work-log-groomer containers were consuming 6-7 GB of memory on average just to clean logs. 🤯 With this small change, I added a flag that allows us to delete not only the file but also all the folders, once it doesn't make sense to save empty folders. In the screenshot below you can find the real impact of this change:  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
