[
https://issues.apache.org/jira/browse/HDFS-13616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16490068#comment-16490068
]
Andrew Wang commented on HDFS-13616:
------------------------------------
Latest patch addresses some precommit issues. As stated earlier, non-HDFS
filesystems are going to throw UnsupportedOperationException. One correction to
my earlier comment too, the default listing limit is 1000, not 100. 100 is the
current default limit on the number of paths that can be listed per batched
listing call.
Hi Nicholas, thanks for taking a look. Currently we don't see a need for API
support beyond listing. The workload we're looking at is metadata loading for
applications like Hive and Impala.
Regarding an async API, Todd's benchmarking shows that the batched API is more
CPU efficient than processing individual listing calls. It beats the 5-thread
case for sparse directories in CPU time and wall time. My benchmarking
additionally shows that the batched API generates significantly less garbage.
This batched listing API could also be combined with an async API (or a thread
pool), so it's not an "either or" situation.
> Batch listing of multiple directories
> -------------------------------------
>
> Key: HDFS-13616
> URL: https://issues.apache.org/jira/browse/HDFS-13616
> Project: Hadoop HDFS
> Issue Type: New Feature
> Affects Versions: 3.2.0
> Reporter: Andrew Wang
> Assignee: Andrew Wang
> Priority: Major
> Attachments: HDFS-13616.001.patch, HDFS-13616.002.patch
>
>
> One of the dominant workloads for external metadata services is listing of
> partition directories. This can end up being bottlenecked on RTT time when
> partition directories contain a small number of files. This is fairly common,
> since fine-grained partitioning is used for partition pruning by the query
> engines.
> A batched listing API that takes multiple paths amortizes the RTT cost.
> Initial benchmarks show a 10-20x improvement in metadata loading performance.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]