sumeetgajjar commented on PR #4825: URL: https://github.com/apache/iceberg/pull/4825#issuecomment-1139330018
> I’m not sure if you have seen this PR I opened today, but I noticed one of these tests fail in CI (details and link in PR summary). I opened the following PR for that test case (admittedly I forgot about this one so we can close mine if you’d like) - #4859 > > TLDR - As suggested by Russell, for the test case in my PR, we changed the “olderThan” argument to be further in the future. The benefit being no unnecessary busy waiting and all of the files are still caught. I went with 5 seconds given that the solution in the other PR adds no busy waiting and that specific test removes _every_ orphan file so the olderThan argument doesn’t need to be very precise… just far enough “in the future” relative to the timestamp of the files that Spark writes to grab them all. Hi @kbendick - thanks for the suggestion, however, the test that you are fixing in #4859 (TestRemoveOrphanFilesAction#orphanedFileRemovedWithParallelTasks) requires all the files to be removed. Thus providing a future time would ensure that Predicate selects all the files as candidates for removal. However, for this PR (TestRemoveOrphanFilesAction#testOlderThanTimestamp), we intentionally want to wait until a second has passed to avoid scenarios like [mtime from java is truncated to seconds](https://stackoverflow.com/questions/24804618/get-file-mtime-with-millisecond-resolution-from-java) where we loose tracking millisecond precision while getting lastModification time. The flow for this test is as follows: - spark writes non-table files to tableLocation - spark writes non-table files to tableLocation - spark writes non-table files to tableLocation - delete orphan files such that only files created from the first two writes are removed and the file created from the third write is preserved And because of this caveat of preserving the files from the third, we'll have to busy wait for a second. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
