[ 
https://issues.apache.org/jira/browse/HADOOP-14965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16215310#comment-16215310
 ] 

Steve Loughran commented on HADOOP-14965:
-----------------------------------------

Without any data on real-world-use, here's how the new adaptive scheme breaks a 
test because it cuts the #of stream aborts down from 4 to 1. Note also that the 
stream stats now include the enum value of the seek count & the number of 
changes.

{code}
-------------------------------------------------------
 T E S T S
-------------------------------------------------------
Running org.apache.hadoop.fs.s3a.scale.ITestS3AInputStreamPerformance
Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 99.27 sec <<< 
FAILURE! - in org.apache.hadoop.fs.s3a.scale.ITestS3AInputStreamPerformance
testRandomIONormalPolicy(org.apache.hadoop.fs.s3a.scale.ITestS3AInputStreamPerformance)
  Time elapsed: 5.651 sec  <<< FAILURE!
java.lang.AssertionError: streams aborted in StreamStatistics{OpenOperations=4, 
CloseOperations=4, Closed=3, Aborted=1, SeekOperations=2, ReadExceptions=0, 
ForwardSeekOperations=0, BackwardSeekOperations=2, BytesSkippedOnSeek=0, 
BytesBackwardsOnSeek=6356992, BytesRead=1376256, BytesRead excluding 
skipped=1376256, ReadOperations=161, ReadFullyOperations=4, 
ReadsIncomplete=157, BytesReadInClose=0, BytesDiscardedInAbort=43375083, 
InputPolicy=2, InputPolicySetCount=2} expected:<4> but was:<1>
        at org.junit.Assert.fail(Assert.java:88)
        at org.junit.Assert.failNotEquals(Assert.java:743)
        at org.junit.Assert.assertEquals(Assert.java:118)
        at org.junit.Assert.assertEquals(Assert.java:555)
        at 
org.apache.hadoop.fs.s3a.scale.ITestS3AInputStreamPerformance.testRandomIONormalPolicy(ITestS3AInputStreamPerformance.java:429)
{code}

> s3a input stream "normal" fadvise mode to be adaptive
> -----------------------------------------------------
>
>                 Key: HADOOP-14965
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14965
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Steve Loughran
>
> HADOOP-14535 added seek optimisation to wasb, but rather than require the 
> caller to declare sequential vs random, it works out for itself.
> # defaults to sequential, lazy seek
> # if the caller ever seeks backwards, switches to random IO.
> This means that on the use pattern of columnar stores: of go to end of file, 
> read summary, then go to columns and work forwards, will switch to random IO 
> after that first seek back (cost: one aborted HTTP connection)/.
> Where this should benefit the most is in downstream apps where you are 
> working with different data sources in the same object store/running of the 
> same app config, but have different read patterns. I'm seeing exactly this in 
> some of my spark tests, where it's near impossible to set things up so that 
> .gz files are read sequentially, but ORC data is read in random IO
> I propose the "normal" fadvise => adaptive, sequential==sequential always, 
> random => random from the outset.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to