steveloughran opened a new pull request, #15586: URL: https://github.com/apache/iceberg/pull/15586
Fixes #15353 Improve file opening and read times by - keeping file status when known, using it in openFile() call to eliminate HEAD requests - choosing file input policy when reading a file (`Util.determineReadPolicy()`). ParquetIO already hands down file opening to parquet, which does the right thing.l What matters for it is retaining any FileStatus already obtained, which is what the changes in `TableMigrationUtil` do. It's a shame that parquet (currently) lacks a way to skip that stat() call which is does to get file length, as this adds a HEAD request to all openings of a parquet file where the length is known from a manifest. That is fixable and would save 100+mS per file opening, as well as the associated IO capacity. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
