[
https://issues.apache.org/jira/browse/HDFS-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15329298#comment-15329298
]
Steve Loughran commented on HDFS-10413:
---------------------------------------
Can I note that from the perspective of S3a, using listFiles(recursive=true) is
significantly faster than using listStatus(). If code were encouraged to use
that API rather than their own treewalk, then anything that works with object
stores would see significant speedup.
Also, listFiles and similar use the RemoteIterator. That code can be async, to
the extent that the results can be arriving while the client is processing the
previous results. The code I'm doing in HADOOP-13208 doesn't do that, but it
does do windowed queries; you only get a window-full of files listed, filtered
and made available at a time. This keeps memory consumption down.
> Implement asynchronous listStatus for DistributedFileSystem
> -----------------------------------------------------------
>
> Key: HDFS-10413
> URL: https://issues.apache.org/jira/browse/HDFS-10413
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Xiaobing Zhou
> Assignee: Xiaobing Zhou
>
> Per the
> [comment|https://issues.apache.org/jira/browse/HDFS-9924?focusedCommentId=15285597&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15285597]
> from [~mingma], this Jira tracks efforts of implementing async listStatus.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]