[
https://issues.apache.org/jira/browse/PARQUET-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17633726#comment-17633726
]
ASF GitHub Bot commented on PARQUET-2213:
-----------------------------------------
steveloughran commented on PR #1010:
URL: https://github.com/apache/parquet-mr/pull/1010#issuecomment-1313557921
I would prefer if Parquet used the same opt(key, value) builder pattern that
we use in the new hadoop FS api calls.
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/fsdatainputstreambuilder.html
This allows for future addition of new options. The reader could then take
them and, where appropriate, map them to the hadoop openfile options
org.apache.hadoop.fs.Options.OpenFileOptions#FS_OPTION_OPENFILE_STANDARD_OPTIONS
which can then get picked up by the connector.
passing in split/start end and file length is good.
file length: used by s3a to skip the HEAD when opening; abfs and gcs could
copy. abfs will take a FileStatus in the withFileStatus() parameter
split start: where to begin that read
split end: should be used by prefetchers to know where to stop prefetching
parquet should set the read policy itself, i'd go for "random, adaptive" as
the ordered list, with "vectored" in front of that when vectored IO is to be
used.
> Add an alternative InputFile.newStream that allow an input range
> ----------------------------------------------------------------
>
> Key: PARQUET-2213
> URL: https://issues.apache.org/jira/browse/PARQUET-2213
> Project: Parquet
> Issue Type: Improvement
> Reporter: Chao Sun
> Priority: Minor
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)