steveloughran commented on PR #6877: URL: https://github.com/apache/hadoop/pull/6877#issuecomment-2531814999
FYI parquet trunk now uses openFile() with a file status and declared read policy "parquet, vector, random", so all hadoop releases >= 3.3.0 will at least use random S3 IO; 3.4.0/3.4.1 uses vector IO and 3.4.2 may use parquet specific code paths. This will come in parquet 15.1, leaving Avro and ORC as the next targets. Please grab and test that parquet beta release to make sure it does what you expect with S3 and Azure both reducing a HEAD per file -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
