Steve Loughran created HADOOP-14965:
---------------------------------------

             Summary: s3a input stream "normal" fadvise mode to be adaptive
                 Key: HADOOP-14965
                 URL: https://issues.apache.org/jira/browse/HADOOP-14965
             Project: Hadoop Common
          Issue Type: Sub-task
            Reporter: Steve Loughran


HADOOP-14535 added seek optimisation to wasb, but rather than require the 
caller to declare sequential vs random, it works out for itself.

# defaults to sequential, lazy seek
# if the caller ever seeks backwards, switches to random IO.

This means that on the use pattern of columnar stores: of go to end of file, 
read summary, then go to columns and work forwards, will switch to random IO 
after that first seek back (cost: one aborted HTTP connection)/.

Where this should benefit the most is in downstream apps where you are working 
with different data sources in the same object store/running of the same app 
config, but have different read patterns. I'm seeing exactly this in some of my 
spark tests, where it's near impossible to set things up so that .gz files are 
read sequentially, but ORC data is read in random IO

I propose the "normal" fadvise => adaptive, sequential==sequential always, 
random => random from the outset.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to