[
https://issues.apache.org/jira/browse/HDFS-13616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16489530#comment-16489530
]
Todd Lipcon commented on HDFS-13616:
------------------------------------
One other performance point: in addition to reducing "wait time" due to network
RTT, this also should reduce in a net reduction in load on the NN vs separate
calls. That's because we amortize fixed RPC costs like context switches to and
from the IPC threads, should get much better CPU cache locality (both
instruction and data caches), and amortize lock acquisition overhead on the FSN
lock.
I think the batched API also offers some future optimizations like amortizing
the path traversal cost in the common case that all of the arguments share a
common prefix path. This is exceedingly common in applications like Hive where
the planner must fetch file lists for
/user/hive/warehouse/dbname/tablename/{...100 partitions...}. Again this should
be a net reduction in NN CPU usage as well as an improvement in client-visible
wall-clock.
Andrew, any chance you've done a simple benchmark on the CPU time spent
namenode-side, eg of 1000 "listdir" calls vs 1 batched call for the same set of
directories? I can help with that if you haven't already set something up.
> Batch listing of multiple directories
> -------------------------------------
>
> Key: HDFS-13616
> URL: https://issues.apache.org/jira/browse/HDFS-13616
> Project: Hadoop HDFS
> Issue Type: New Feature
> Affects Versions: 3.2.0
> Reporter: Andrew Wang
> Assignee: Andrew Wang
> Priority: Major
> Attachments: HDFS-13616.001.patch
>
>
> One of the dominant workloads for external metadata services is listing of
> partition directories. This canĀ end up being bottlenecked on RTT time when
> partition directories contain a small number of files. This is fairly common,
> since fine-grained partitioning is used for partition pruning by the query
> engines.
> A batched listing API that takes multiple paths amortizes the RTT cost.
> Initial benchmarks show a 10-20x improvement in metadata loading performance.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]