steveloughran commented on PR #1010: URL: https://github.com/apache/parquet-mr/pull/1010#issuecomment-1313557921
I would prefer if Parquet used the same opt(key, value) builder pattern that we use in the new hadoop FS api calls. https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/fsdatainputstreambuilder.html This allows for future addition of new options. The reader could then take them and, where appropriate, map them to the hadoop openfile options org.apache.hadoop.fs.Options.OpenFileOptions#FS_OPTION_OPENFILE_STANDARD_OPTIONS which can then get picked up by the connector. passing in split/start end and file length is good. file length: used by s3a to skip the HEAD when opening; abfs and gcs could copy. abfs will take a FileStatus in the withFileStatus() parameter split start: where to begin that read split end: should be used by prefetchers to know where to stop prefetching parquet should set the read policy itself, i'd go for "random, adaptive" as the ordered list, with "vectored" in front of that when vectored IO is to be used. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
