[
https://issues.apache.org/jira/browse/HADOOP-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Steve Loughran updated HADOOP-15229:
------------------------------------
Status: Patch Available (was: Open)
HADOOP-15229 patch 017: checkstyle and comments
* address comments on JIRA, including what AWS logs on select calls
* checkstyle/findbug issues
* section in docs on why you should use passthrough codec on .csv files to
disable splitting
* and why fs.s3a.blocksize can do this too, but it is limited and brittle
* some test enhancements, especially that setReadahead() works, double close()
is safe, toString() After close() is safe.
Tested S3 ireland
[~fabbri] regarding those temp credential/assumed role failures, I saw them, I
know what causes them and they are fixed in the HADOOP-14556 patch. It happens
when you switch to keeping your AWS secrets in the JCEKS file; the test aren't
overwriting the login details, even though they update the configuration
object. In the '14556 patch those tests unset the hadoop.credential.provider
option so those existing secrets are not picked up
> Add FileSystem builder-based openFile() API to match createFile() + S3 Select
> -----------------------------------------------------------------------------
>
> Key: HADOOP-15229
> URL: https://issues.apache.org/jira/browse/HADOOP-15229
> Project: Hadoop Common
> Issue Type: New Feature
> Components: fs, fs/azure, fs/s3
> Affects Versions: 3.0.0
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Priority: Major
> Attachments: HADOOP-15229-001.patch, HADOOP-15229-002.patch,
> HADOOP-15229-003.patch, HADOOP-15229-004.patch, HADOOP-15229-004.patch,
> HADOOP-15229-005.patch, HADOOP-15229-006.patch, HADOOP-15229-007.patch,
> HADOOP-15229-009.patch, HADOOP-15229-010.patch, HADOOP-15229-011.patch,
> HADOOP-15229-012.patch, HADOOP-15229-013.patch, HADOOP-15229-014.patch,
> HADOOP-15229-015.patch, HADOOP-15229-016.patch, HADOOP-15229-017.patch
>
>
> Replicate HDFS-1170 and HADOOP-14365 with an API to open files.
> A key requirement of this is not HDFS, it's to put in the fadvise policy for
> working with object stores, where getting the decision to do a full GET and
> TCP abort on seek vs smaller GETs is fundamentally different: the wrong
> option can cost you minutes. S3A and Azure both have adaptive policies now
> (first backward seek), but they still don't do it that well.
> Columnar formats (ORC, Parquet) should be able to say "fs.input.fadvise"
> "random" as an option when they open files; I can imagine other options too.
> The Builder model of [~eddyxu] is the one to mimic, method for method.
> Ideally with as much code reuse as possible
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]