xuang7 opened a new issue, #3681:
URL: https://github.com/apache/texera/issues/3681

   ### **Problem Statement**
   Currently, users can upload files to datasets without committing their 
changes. These "uploaded but uncommitted" files remain fully uploaded in object 
storage (e.g., MinIO, S3) indefinitely, consuming storage resources. Since 
LakeFS only tracks committed files, these uncommitted uploads fall outside of 
standard garbage collection processes. Over time, these files accumulate and 
consume significant storage resources, creating architectural and operational 
challenges including unnecessary storage costs, resource waste, and 
difficulties in managing datasets.
   
   Example: uploaded but uncommitted file remain on disk indefinitely unless 
manually removed
   <img width="1002" height="516" alt="Image" 
src="https://github.com/user-attachments/assets/d9e61e5e-8341-41a7-991b-611361612c85";
 />
   
   
   ### **Proposed Solution**
   Implement a configurable scheduled job that automatically removes 
uncommitted files after a specified retention period
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to