[ 
https://issues.apache.org/jira/browse/AVRO-3594?focusedWorklogId=798094&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-798094
 ]

ASF GitHub Bot logged work on AVRO-3594:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 04/Aug/22 16:05
            Start Date: 04/Aug/22 16:05
    Worklog Time Spent: 10m 
      Work Description: steveloughran opened a new pull request, #1807:
URL: https://github.com/apache/avro/pull/1807

   
   Boost performance reading from object stores in hadoop 3.3.5+ by using the 
openFile builder API and passing in the file length as an option (can save a 
HEAD) and asks for adaptive IO (sequential going to random if the client starts 
seeking)
   
   saving that HEAD request is a key benefit against s3 as it can save 50-100 
mS per file.
   
   ### Jira
   
   - [X] My PR addresses the following [Avro 
Jira](https://issues.apache.org/jira/browse/AVRO/) issues and references them 
in the PR title. For example, "AVRO-1234: My Avro PR"
     - https://issues.apache.org/jira/browse/AVRO-XXX
     - In case you are adding a dependency, check if the license complies with 
the [ASF 3rd Party License 
Policy](https://www.apache.org/legal/resolved.html#category-x).
   
   ### Tests
   
   - [X] My PR does not need testing for this extremely good reason:
   
   1. All existing local file IO tests act as regression tests.
   2. avro isn't set up for integration tests with abfs/gs/s3a urls.
   3. mocking doesn't really do much here.
   
   Integration tests would be the way to do this, but the foundational set up 
to do this
   is pretty complex. My 
[cloudstream](https://github.com/hortonworks-spark/cloud-integration) project 
downstream of spark is set up to do this.
   
   ### Commits
   
   - [X] My commits all reference Jira issues in their subject lines. In 
addition, my commits follow the guidelines from "[How to write a good git 
commit message](https://chris.beams.io/posts/git-commit/)":
     1. Subject is separated from body by a blank line
     1. Subject is limited to 50 characters (not including Jira issue reference)
     1. Subject does not end with a period
     1. Subject uses the imperative mood ("add", not "adding")
     1. Body wraps at 72 characters
     1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [X] In case of new functionality, my PR adds documentation that describes 
how to use it.
     - All the public functions and the classes in the PR contain Javadoc that 
explain what it does
   
   no new docs
   




Issue Time Tracking
-------------------

            Worklog Id:     (was: 798094)
    Remaining Estimate: 0h
            Time Spent: 10m

> FsInput to use openFile() API for cloud storage read performance
> ----------------------------------------------------------------
>
>                 Key: AVRO-3594
>                 URL: https://issues.apache.org/jira/browse/AVRO-3594
>             Project: Apache Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.11.2
>            Reporter: Steve Loughran
>            Priority: Minor
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> avro can now use the FileSystem.openFile() API to open a file on a hadoop 
> filesystem connector (HADOOP-15229).
> by setting the file length and fadvise policy through opt() calls, the 
> clients can
> * skip a HEAD request when opening a file
> * optimise the ranges of GET request for sequential access, even in clusters 
> where s3a has been configured to use random iO (which some hive clusters do)
> filesystems/releases which don't recognise the options added in HADOOP-16202 
> will ignore them; the api will fall back to classic open(path) API call if 
> the connector doesn't have a custom implementation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to