sam-dumont opened a new pull request, #62985:
URL: https://github.com/apache/airflow/pull/62985
<!-- SPDX-License-Identifier: Apache-2.0
https://www.apache.org/licenses/LICENSE-2.0 -->
Fix `CloudwatchTaskHandler` not deleting local log files after streaming to
CloudWatch.
In Airflow 3, logs stream to CloudWatch in real-time via structlog
processors, so `upload()` was a no-op. But `delete_local_copy` was never
honoured: local log files kept accumulating on shared storage indefinitely.
We hit this in production on EFS. Storage grew at ~31 GB/day from log
accumulation alone. After deploying this fix, net growth dropped to effectively
zero. A separate cleanup job removed the ~780 GB backlog that had built up.
**What changed:**
- `CloudWatchRemoteLogIO.upload()` deletes the local log parent directory
when `delete_local_copy` is True
- `CloudwatchTaskHandler.close()` calls `upload()` with the rendered log
path stored during `set_context()`
- Path traversal guard: `upload()` refuses to delete anything outside
`base_log_folder`
---
##### Was generative AI tooling used to co-author this PR?
- [X] Yes — Claude Code (Claude Opus 4.6)
Generated-by: Claude Code (Claude Opus 4.6) following [the
guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#gen-ai-assisted-contributions)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]