[
https://issues.apache.org/jira/browse/HDFS-13616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16491400#comment-16491400
]
Andrew Wang commented on HDFS-13616:
------------------------------------
bq. if I pass a file (instead of a directory), will I get back a standalone
PartialListing that only includes the FileStatus for that file?
Good question! I have a unit test that covers this too, it returns back a
PartialListing with just the filestatus of the file, and getParent will return
the file's path. This is the same behavior as listLocatedStatus.
This makes me realize though that "getParent" is not the best name since it
won't always be the parent, maybe getSourcePath? getListedPath? Happy to take
suggestions here, and yea I can beef up the documentation around this too.
> Batch listing of multiple directories
> -------------------------------------
>
> Key: HDFS-13616
> URL: https://issues.apache.org/jira/browse/HDFS-13616
> Project: Hadoop HDFS
> Issue Type: New Feature
> Affects Versions: 3.2.0
> Reporter: Andrew Wang
> Assignee: Andrew Wang
> Priority: Major
> Attachments: BenchmarkListFiles.java, HDFS-13616.001.patch,
> HDFS-13616.002.patch
>
>
> One of the dominant workloads for external metadata services is listing of
> partition directories. This canĀ end up being bottlenecked on RTT time when
> partition directories contain a small number of files. This is fairly common,
> since fine-grained partitioning is used for partition pruning by the query
> engines.
> A batched listing API that takes multiple paths amortizes the RTT cost.
> Initial benchmarks show a 10-20x improvement in metadata loading performance.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]