sumeetgajjar commented on PR #4825:
URL: https://github.com/apache/iceberg/pull/4825#issuecomment-1139330018

   > I’m not sure if you have seen this PR I opened today, but I noticed one of 
these tests fail in CI (details and link in PR summary). I opened the following 
PR for that test case (admittedly I forgot about this one so we can close mine 
if you’d like) - #4859
   > 
   > TLDR - As suggested by Russell, for the test case in my PR, we changed the 
“olderThan” argument to be further in the future. The benefit being no 
unnecessary busy waiting and all of the files are still caught. I went with 5 
seconds given that the solution in the other PR adds no busy waiting and that 
specific test removes _every_ orphan file so the olderThan argument doesn’t 
need to be very precise… just far enough “in the future” relative to the 
timestamp of the files that Spark writes to grab them all.
   
   Hi @kbendick - thanks for the suggestion, however, the test that you are 
fixing in #4859 
(TestRemoveOrphanFilesAction#orphanedFileRemovedWithParallelTasks) requires all 
the files to be removed. Thus providing a future time would ensure that 
Predicate selects all the files as candidates for removal.
   
   
   However, for this PR (TestRemoveOrphanFilesAction#testOlderThanTimestamp), 
we intentionally want to wait until a second has passed to avoid scenarios like 
[mtime from java is truncated to 
seconds](https://stackoverflow.com/questions/24804618/get-file-mtime-with-millisecond-resolution-from-java)
 where we loose tracking millisecond precision while getting lastModification 
time.
   
   
   The flow for this test is as follows:
   - spark writes non-table files to tableLocation
   - spark writes non-table files to tableLocation
   - spark writes non-table files to tableLocation
   - delete orphan files such that only files created from the first two writes 
are removed and the file created from the third write is preserved
   
   And because of this caveat of preserving the files from the third, we'll 
have to busy wait for a second.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to