[
https://issues.apache.org/jira/browse/HADOOP-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16703305#comment-16703305
]
Steve Loughran commented on HADOOP-15229:
-----------------------------------------
[~yuzhousun]
I'm ignoring the other formats in this JIRA, as that's going to take extra
testing. What I am trying to do here is get the API right —using async file
open & the s3 select as the validation—
w.r.t output vs input formats, again, I've simplified my life here. There's no
control whatsoever on what you get back, which lines code up for
straightforward parsing.
I propose a followup patch, ideally with some $0 to use, stable, test datasets
for all supported formats. We make heavy use of the landsat.csv.gz file as it
keeps costs low for anyone testing and there's no setup time, but its
unstable...once you start making queries off it its nice to know what's going
to come back as valid answers.
[~owen.omalley]
will do. Keeping impl stuff in its own package helps in java9+ too, doesn't it?
Though for other FS impls, it'd still need to be public
> Add FileSystem builder-based openFile() API to match createFile()
> -----------------------------------------------------------------
>
> Key: HADOOP-15229
> URL: https://issues.apache.org/jira/browse/HADOOP-15229
> Project: Hadoop Common
> Issue Type: New Feature
> Components: fs, fs/azure, fs/s3
> Affects Versions: 3.0.0
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Priority: Major
> Attachments: HADOOP-15229-001.patch, HADOOP-15229-002.patch,
> HADOOP-15229-003.patch, HADOOP-15229-004.patch, HADOOP-15229-004.patch,
> HADOOP-15229-005.patch, HADOOP-15229-006.patch
>
>
> Replicate HDFS-1170 and HADOOP-14365 with an API to open files.
> A key requirement of this is not HDFS, it's to put in the fadvise policy for
> working with object stores, where getting the decision to do a full GET and
> TCP abort on seek vs smaller GETs is fundamentally different: the wrong
> option can cost you minutes. S3A and Azure both have adaptive policies now
> (first backward seek), but they still don't do it that well.
> Columnar formats (ORC, Parquet) should be able to say "fs.input.fadvise"
> "random" as an option when they open files; I can imagine other options too.
> The Builder model of [~eddyxu] is the one to mimic, method for method.
> Ideally with as much code reuse as possible
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]