[ 
https://issues.apache.org/jira/browse/HADOOP-16437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16888794#comment-16888794
 ] 

Steve Loughran commented on HADOOP-16437:
-----------------------------------------

We have space in S3AFilesystem.addDeprecatedKeys() for precisely this kind of 
thing. Should be backported to branch-2 and 3.2.x

3.2 will switch from sequential to random the first time you do a backward 
seek, so when reading ORC & Parquet files, if they read the footer and then 
seek back to the first columns they actually want to scan, it'll kick off the 
switch. the random mode lets you do that without wasting the first read (if the 
pos() after the footer read < (len(file) -fs.s3a.readahead.range) we'll abort 
and switched to ranged GETs; if less than that then it'll just scan to EOF and 
recycle that HTTPS connection.

But: Hive now pushes down the (pruned) footer to the workers during a query, so 
you don't get that backward seek. So we do need it to say "random IO"

Related work

# HADOOP-15229 is coming in Hadoop 3.3, but I'm happy to push back that API 
earlier (just not the more complex async GET call in S3a or S3 select support)
# HADOOP-11867 discusses what we want from an async scatter/gather API which 
would be used by ORC + Parquet for their scanning, as they know their read 
patterns ahead of time -rather than wait for the FS client to infer this we can 
just give it the list and the callbacks and let it come up with an optimum read 
plan.

Happy to collaborate there too -it's just something we've not got round to 
doing, with too many short-term commitments. It'd potentially be dramatically 
better for working with the object stores

> Documentation typos: fs.s3a.experimental.fadvise -> 
> fs.s3a.experimental.input.fadvise
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-16437
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16437
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: documentation, fs/s3
>    Affects Versions: 3.2.0, 3.3.0, 3.1.2
>            Reporter: Josh Rosen
>            Priority: Major
>             Fix For: 3.3.0
>
>
> The Hadoop documentation references {{fs.s3a.experimental.fadvise}} but I 
> believe this is a typo: the actual configuration key that gets read is 
> {{fs.s3a.experimental.input.fadvise}}.
> I'll submit a PR to fix this.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to