[jira] [Commented] (PARQUET-2213) Add an alternative InputFile.newStream that allow an input range

ASF GitHub Bot (Jira) Mon, 14 Nov 2022 03:49:08 -0800


    [ 
https://issues.apache.org/jira/browse/PARQUET-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17633726#comment-17633726
 ]


ASF GitHub Bot commented on PARQUET-2213:
-----------------------------------------

steveloughran commented on PR #1010:
URL: https://github.com/apache/parquet-mr/pull/1010#issuecomment-1313557921

   I would prefer if Parquet used the same opt(key, value) builder pattern that 
we use in the new hadoop FS api calls. 
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/fsdatainputstreambuilder.html
   
   This allows for future addition of new options. The reader could then take 
them and, where appropriate, map them to the hadoop openfile options 
org.apache.hadoop.fs.Options.OpenFileOptions#FS_OPTION_OPENFILE_STANDARD_OPTIONS
 which can then get picked up by the connector.
   
   passing in split/start end and file length is good.
   file length: used by s3a to skip the HEAD when opening; abfs and gcs could 
copy. abfs will take a FileStatus in the withFileStatus() parameter
   split start: where to begin that read
   split end: should be used by prefetchers to know where to stop prefetching
   
   parquet should set the read policy itself, i'd go for "random, adaptive" as 
the ordered list, with "vectored" in front of that when vectored IO is to be 
used.
   




> Add an alternative InputFile.newStream that allow an input range
> ----------------------------------------------------------------
>
>                 Key: PARQUET-2213
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2213
>             Project: Parquet
>          Issue Type: Improvement
>            Reporter: Chao Sun
>            Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (PARQUET-2213) Add an alternative InputFile.newStream that allow an input range

Reply via email to