eugenegujing commented on PR #5643: URL: https://github.com/apache/texera/pull/5643#issuecomment-4705572243
@Yicong-Huang Thanks! I think your idea helps a lot. I'm thinking of transforming this PR from deleting any uncommitted trash files to delivering an audit summary over a retention window instead. For this PR, I am considering the safest first step: A. Audit-only scheduled scan - The job still scans expired upload sessions and stale uncommitted LakeFS objects using the same candidate-detection logic that cleanup would need. - It does not abort multipart uploads, delete DB rows, or reset LakeFS objects. - It only emits a round summary, for example: expiredSessionsFound, staleObjectsFound, errors, truncated. - Candidate-level details can stay at debug level so we can inspect them when needed without making normal logs noisy. - Cleanup stays disabled / non-destructive by default. The important connection is that A is not a throwaway version: it establishes the candidate model and the scheduled scan path, but makes the output observational only. Then follow-up PRs can gradually evolve the same audit output into B: B. Persisted audit history + optional cleanup action - Persist audit runs and candidates in DB so admins can inspect historical trends and exact candidates. - Build on the same fields emitted by the audit summary in A, instead of introducing a separate cleanup path later. - Add an explicit admin-facing cleanup action or config-gated cleanup mode only after the persisted audit flow exists. - If real deletion is added, it should be opt-in and reviewed in a separate PR, not enabled by default. So this PR would focus on non-destructive observability first, while keeping the implementation shaped so later PRs can step-by-step develop it into persisted audit history and, eventually, an explicit cleanup action. What do you think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
