[
https://issues.apache.org/jira/browse/HADOOP-16437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16888794#comment-16888794
]
Steve Loughran commented on HADOOP-16437:
-----------------------------------------
We have space in S3AFilesystem.addDeprecatedKeys() for precisely this kind of
thing. Should be backported to branch-2 and 3.2.x
3.2 will switch from sequential to random the first time you do a backward
seek, so when reading ORC & Parquet files, if they read the footer and then
seek back to the first columns they actually want to scan, it'll kick off the
switch. the random mode lets you do that without wasting the first read (if the
pos() after the footer read < (len(file) -fs.s3a.readahead.range) we'll abort
and switched to ranged GETs; if less than that then it'll just scan to EOF and
recycle that HTTPS connection.
But: Hive now pushes down the (pruned) footer to the workers during a query, so
you don't get that backward seek. So we do need it to say "random IO"
Related work
# HADOOP-15229 is coming in Hadoop 3.3, but I'm happy to push back that API
earlier (just not the more complex async GET call in S3a or S3 select support)
# HADOOP-11867 discusses what we want from an async scatter/gather API which
would be used by ORC + Parquet for their scanning, as they know their read
patterns ahead of time -rather than wait for the FS client to infer this we can
just give it the list and the callbacks and let it come up with an optimum read
plan.
Happy to collaborate there too -it's just something we've not got round to
doing, with too many short-term commitments. It'd potentially be dramatically
better for working with the object stores
> Documentation typos: fs.s3a.experimental.fadvise ->
> fs.s3a.experimental.input.fadvise
> -------------------------------------------------------------------------------------
>
> Key: HADOOP-16437
> URL: https://issues.apache.org/jira/browse/HADOOP-16437
> Project: Hadoop Common
> Issue Type: Bug
> Components: documentation, fs/s3
> Affects Versions: 3.2.0, 3.3.0, 3.1.2
> Reporter: Josh Rosen
> Priority: Major
> Fix For: 3.3.0
>
>
> The Hadoop documentation references {{fs.s3a.experimental.fadvise}} but I
> believe this is a typo: the actual configuration key that gets read is
> {{fs.s3a.experimental.input.fadvise}}.
> I'll submit a PR to fix this.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]