xuang7 opened a new issue, #3681: URL: https://github.com/apache/texera/issues/3681
### **Problem Statement** Currently, users can upload files to datasets without committing their changes. These "uploaded but uncommitted" files remain fully uploaded in object storage (e.g., MinIO, S3) indefinitely, consuming storage resources. Since LakeFS only tracks committed files, these uncommitted uploads fall outside of standard garbage collection processes. Over time, these files accumulate and consume significant storage resources, creating architectural and operational challenges including unnecessary storage costs, resource waste, and difficulties in managing datasets. Example: uploaded but uncommitted file remain on disk indefinitely unless manually removed <img width="1002" height="516" alt="Image" src="https://github.com/user-attachments/assets/d9e61e5e-8341-41a7-991b-611361612c85" /> ### **Proposed Solution** Implement a configurable scheduled job that automatically removes uncommitted files after a specified retention period -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
