[
https://issues.apache.org/jira/browse/AVRO-3594?focusedWorklogId=798094&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-798094
]
ASF GitHub Bot logged work on AVRO-3594:
----------------------------------------
Author: ASF GitHub Bot
Created on: 04/Aug/22 16:05
Start Date: 04/Aug/22 16:05
Worklog Time Spent: 10m
Work Description: steveloughran opened a new pull request, #1807:
URL: https://github.com/apache/avro/pull/1807
Boost performance reading from object stores in hadoop 3.3.5+ by using the
openFile builder API and passing in the file length as an option (can save a
HEAD) and asks for adaptive IO (sequential going to random if the client starts
seeking)
saving that HEAD request is a key benefit against s3 as it can save 50-100
mS per file.
### Jira
- [X] My PR addresses the following [Avro
Jira](https://issues.apache.org/jira/browse/AVRO/) issues and references them
in the PR title. For example, "AVRO-1234: My Avro PR"
- https://issues.apache.org/jira/browse/AVRO-XXX
- In case you are adding a dependency, check if the license complies with
the [ASF 3rd Party License
Policy](https://www.apache.org/legal/resolved.html#category-x).
### Tests
- [X] My PR does not need testing for this extremely good reason:
1. All existing local file IO tests act as regression tests.
2. avro isn't set up for integration tests with abfs/gs/s3a urls.
3. mocking doesn't really do much here.
Integration tests would be the way to do this, but the foundational set up
to do this
is pretty complex. My
[cloudstream](https://github.com/hortonworks-spark/cloud-integration) project
downstream of spark is set up to do this.
### Commits
- [X] My commits all reference Jira issues in their subject lines. In
addition, my commits follow the guidelines from "[How to write a good git
commit message](https://chris.beams.io/posts/git-commit/)":
1. Subject is separated from body by a blank line
1. Subject is limited to 50 characters (not including Jira issue reference)
1. Subject does not end with a period
1. Subject uses the imperative mood ("add", not "adding")
1. Body wraps at 72 characters
1. Body explains "what" and "why", not "how"
### Documentation
- [X] In case of new functionality, my PR adds documentation that describes
how to use it.
- All the public functions and the classes in the PR contain Javadoc that
explain what it does
no new docs
Issue Time Tracking
-------------------
Worklog Id: (was: 798094)
Remaining Estimate: 0h
Time Spent: 10m
> FsInput to use openFile() API for cloud storage read performance
> ----------------------------------------------------------------
>
> Key: AVRO-3594
> URL: https://issues.apache.org/jira/browse/AVRO-3594
> Project: Apache Avro
> Issue Type: Improvement
> Components: java
> Affects Versions: 1.11.2
> Reporter: Steve Loughran
> Priority: Minor
> Time Spent: 10m
> Remaining Estimate: 0h
>
> avro can now use the FileSystem.openFile() API to open a file on a hadoop
> filesystem connector (HADOOP-15229).
> by setting the file length and fadvise policy through opt() calls, the
> clients can
> * skip a HEAD request when opening a file
> * optimise the ranges of GET request for sequential access, even in clusters
> where s3a has been configured to use random iO (which some hive clusters do)
> filesystems/releases which don't recognise the options added in HADOOP-16202
> will ignore them; the api will fall back to classic open(path) API call if
> the connector doesn't have a custom implementation.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)