steveloughran commented on issue #4346: URL: https://github.com/apache/iceberg/issues/4346#issuecomment-1117110626
fwiw, s3a and abfs in the not yet released hadoop branc&3.3 adds an [EtagSouce](https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/EtagSource.java) interface which FileStatus/LocatedFileStatus subclasses can implement. this lets you compare files, if the value is non null/empty, then files with different etags are guaranteed to be different. i know that iceberg likes to builld against very old versions of hadoop, but if you do leave space in the indices for file etags, and some pluggable mechanism to retrieve them, then etag based checking would work. note also s3 and abfs return those etags in list operations, there's no need to do HEAD calls on each file., and i think gcs does the same, though it doesn't have support through its client yet -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
