dongkelun commented on PR #5406: URL: https://github.com/apache/hudi/pull/5406#issuecomment-1120339108
> @dongkelun I guess there is some confusion with respect to the documentation for the said config. I have added a review comment [#4927 (comment)](https://github.com/apache/hudi/pull/4927#discussion_r867342264) to correct this. Also regarding the existing functionality, the code is fine in its correct form and no changes are required. To quote the documentation from java docs - `We assume that the max(query execution time) == commit_batch_time * config.getCleanerCommitsRetained(). > > * This is 5 hours by default (assuming ingestion is running every 30 minutes). This is essential to leave the file > * used by the query that is running for the max time.` > > Please let me know if you still have any doubts and I can explain it further. | commit | retainTime(min)| |-----|-----| | 001.commit | 0| | 002.commit | 30| | 003.commit | 60| | 004.commit | 90| | 005.commit | 120| | 006.commit | 150| | 007.commit | 180| | 008.commit | 210| | 009.commit | 240| | 0010.commit | 270| | 0011.commit | 300| As you said, a file is retained for 5 hours by default, that is, 300 minutes. So when `0011`.commit is completed,` 001.commit` has been reserved for 300 minutes, so we should not keep `001.commit` when we clean. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
