[ 
https://issues.apache.org/jira/browse/HADOOP-14965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16215788#comment-16215788
 ] 

ASF GitHub Bot commented on HADOOP-14965:
-----------------------------------------

GitHub user steveloughran opened a pull request:

    https://github.com/apache/hadoop/pull/283

    HADOOP-14965 s3a input stream "normal" fadvise mode to be adaptive

    This makes the {{S3AInputStream.inputPolicy}} non-final, and on the first 
backwards seek on a Normal input, switches it to Random (logging @ info in the 
process). If seeks are forward(), it just skips forwards, as sequential input 
does.
    
    The input stream instrumentation counts the #of times the policy was 
changed (including the first), and the current value, where it is picked up in 
tests (so there's no need to add a test accessor as an input stream feature 
itself). 
    
    The test {{ITestS3AInputStreamPerformance.testRandomIONormalPolicy}} broke 
as the instrumentation showed only 1 TCP abort, not 4. This is a success, as it 
shows the policy is adapting.
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/steveloughran/hadoop 
s3/HADOOP-14965-adaptive-seek

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/hadoop/pull/283.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #283
    
----
commit c581840a800dd22372c4a2b78c3ce5c2da2fd3fe
Author: Steve Loughran <[email protected]>
Date:   2017-10-23T20:27:00Z

    HADOOP-14965 patch 001: the "normal" input policy switches from sequential 
to random IO
    
    Change-Id: I95459f063b5da973619334bacae7fd89953e1bec

----


> s3a input stream "normal" fadvise mode to be adaptive
> -----------------------------------------------------
>
>                 Key: HADOOP-14965
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14965
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>
> HADOOP-14535 added seek optimisation to wasb, but rather than require the 
> caller to declare sequential vs random, it works out for itself.
> # defaults to sequential, lazy seek
> # if the caller ever seeks backwards, switches to random IO.
> This means that on the use pattern of columnar stores: of go to end of file, 
> read summary, then go to columns and work forwards, will switch to random IO 
> after that first seek back (cost: one aborted HTTP connection)/.
> Where this should benefit the most is in downstream apps where you are 
> working with different data sources in the same object store/running of the 
> same app config, but have different read patterns. I'm seeing exactly this in 
> some of my spark tests, where it's near impossible to set things up so that 
> .gz files are read sequentially, but ORC data is read in random IO
> I propose the "normal" fadvise => adaptive, sequential==sequential always, 
> random => random from the outset.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to