[jira] [Commented] (HADOOP-14965) s3a input stream "normal" fadvise mode to be adaptive

Hudson (JIRA) Wed, 20 Dec 2017 10:45:23 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-14965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16298906#comment-16298906
 ]


Hudson commented on HADOOP-14965:
---------------------------------

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13411 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13411/])
HADOOP-14965. S3a input stream "normal" fadvise mode to be adaptive (stevel: 
rev 1ba491ff907fc5d2618add980734a3534e2be098)
* (edit) hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md
* (edit) 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/scale/ITestS3AInputStreamPerformance.java
* (edit) 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java
* (edit) 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInstrumentation.java


> s3a input stream "normal" fadvise mode to be adaptive
> -----------------------------------------------------
>
>                 Key: HADOOP-14965
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14965
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.8.1
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>             Fix For: 3.1.0
>
>         Attachments: HADOOP-14965-001.patch, HADOOP-14965-002.patch, 
> HADOOP-14965-003.patch, HADOOP-14965-004.patch
>
>
> HADOOP-14535 added seek optimisation to wasb, but rather than require the 
> caller to declare sequential vs random, it works out for itself.
> # defaults to sequential, lazy seek
> # if the caller ever seeks backwards, switches to random IO.
> This means that on the use pattern of columnar stores: of go to end of file, 
> read summary, then go to columns and work forwards, will switch to random IO 
> after that first seek back (cost: one aborted HTTP connection)/.
> Where this should benefit the most is in downstream apps where you are 
> working with different data sources in the same object store/running of the 
> same app config, but have different read patterns. I'm seeing exactly this in 
> some of my spark tests, where it's near impossible to set things up so that 
> .gz files are read sequentially, but ORC data is read in random IO
> I propose the "normal" fadvise => adaptive, sequential==sequential always, 
> random => random from the outset.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-14965) s3a input stream "normal" fadvise mode to be adaptive

Reply via email to