dongkelun commented on PR #5406:
URL: https://github.com/apache/hudi/pull/5406#issuecomment-1120339108

   > @dongkelun I guess there is some confusion with respect to the 
documentation for the said config. I have added a review comment [#4927 
(comment)](https://github.com/apache/hudi/pull/4927#discussion_r867342264) to 
correct this. Also regarding the existing functionality, the code is fine in 
its correct form and no changes are required. To quote the documentation from 
java docs - `We assume that the max(query execution time) == commit_batch_time 
* config.getCleanerCommitsRetained().
   > 
   > * This is 5 hours by default (assuming ingestion is running every 30 
minutes). This is essential to leave the file
   > * used by the query that is running for the max time.`
   > 
   > Please let me know if you still have any doubts and I can explain it 
further.
   
   | commit | retainTime(min)|
   |-----|-----|
   | 001.commit | 0|
   | 002.commit | 30|
   | 003.commit | 60|
   | 004.commit | 90|
   | 005.commit | 120|
   | 006.commit | 150|
   | 007.commit | 180|
   | 008.commit | 210|
   | 009.commit | 240|
   | 0010.commit | 270|
   | 0011.commit | 300|
   
   As you said, a file is retained for 5 hours by default, that is, 300 
minutes. So when `0011`.commit is completed,` 001.commit` has been reserved for 
300 minutes, so we should not keep `001.commit` when we clean.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to