nsivabalan commented on issue #6938: URL: https://github.com/apache/hudi/issues/6938#issuecomment-1283157298
even if not for replace commits, we have some other fundamental issue here. if you are setting automatic clean = false, then regular writer is never going to trigger clean at all. But still archiver will go ahead and keep archiving timeline files. If you try to clean up way later by a different clean process, it may not find some of the timeline files only (since archiver would have archived), and hence it might miss to clean up some of the data files pertaining to those timeline files. For now, I would advise to relax the archiver w/ regular ingestion pipeline so there won't be dangling data files. I have created a follow up jira for us to work on this gap https://issues.apache.org/jira/browse/HUDI-5054 Let me know if you need any more pointers/clarifications. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
