[
https://issues.apache.org/jira/browse/HADOOP-17400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Steve Loughran updated HADOOP-17400:
------------------------------------
Description:
Make listing in applications as fast as we can get it especially for query
planning.
* All operations used in listing directories for query planning etc to be
optimized for their primary use: being passed directories (not files) and so
make that faster even at the expense of more remote IO when handed files or
empty directories.
* remove needless calls to S3 wherever possible (e.g. {{getFileStatus("/")}},
making bucket existence probes optional)
* Support/enable Asynchronous IO where possible.
Review higher level APIs (glob status) and uses on the FsShell and optimize
their use by minimising invocations or FS API calls, with bonus goal of
reduce/minimize risk of 404 caching.
Work with downstream projects to move to FS APIs which work best in this world
-primarily the recursive listing operations and those which return
RemoteIterator<FileStatus> -and so make any asynchronous page fetching
operations useful.
was:
Make listing in applications as fast as we can get it especially for query
planning.
* All operations used in listing directories for query planning etc to be
optimized for their primary use: being passed directories (not files) and so
make that faster even at the expense of more remote IO when handed files or
empty directories.
* remove needless calls to S3 wherever possible (e.g. getFileStatus("/"),
making bucket existence probes optional)
* Support/enable Asynchronous IO where possible.
Review higher level APIs (glob status) and uses on the FsShell and optimize
their use by minimising invocations or FS API calls, with bonus goal of
reduce/minimize risk of 404 caching.
Work with downstream projects to move to FS APIs which work best in this world
-primarily the recursive listing operations and those which return
RemoteIterator<FileStatus> -and so make any asynchronous page fetching
operations useful.
> Optimize S3A for maximum performance in directory listings
> ----------------------------------------------------------
>
> Key: HADOOP-17400
> URL: https://issues.apache.org/jira/browse/HADOOP-17400
> Project: Hadoop Common
> Issue Type: Improvement
> Components: fs/s3
> Affects Versions: 3.3.0
> Reporter: Steve Loughran
> Assignee: Mukund Thakur
> Priority: Major
>
> Make listing in applications as fast as we can get it especially for query
> planning.
> * All operations used in listing directories for query planning etc to be
> optimized for their primary use: being passed directories (not files) and so
> make that faster even at the expense of more remote IO when handed files or
> empty directories.
> * remove needless calls to S3 wherever possible (e.g. {{getFileStatus("/")}},
> making bucket existence probes optional)
> * Support/enable Asynchronous IO where possible.
>
> Review higher level APIs (glob status) and uses on the FsShell and optimize
> their use by minimising invocations or FS API calls, with bonus goal of
> reduce/minimize risk of 404 caching.
> Work with downstream projects to move to FS APIs which work best in this
> world -primarily the recursive listing operations and those which return
> RemoteIterator<FileStatus> -and so make any asynchronous page fetching
> operations useful.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]