[
https://issues.apache.org/jira/browse/HDFS-13616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16490032#comment-16490032
]
Andrew Wang commented on HDFS-13616:
------------------------------------
Hi Zhe, thanks for taking a look! This API respects the existing lsLimit
setting of 100, and also limits the number of paths that can be listed in a
single batch call. This means that the per-call overhead is very similar to the
existing RemoteIterator<FileStatus> calls when returning 100-item partial
listings. Todd saw ~7ms RPC handling times for 100-item batches on a cluster,
which feels like the right granularity for holding a read lock.
To answer Todd's question about benchmarking, I wrote a little unit test that
invokes NameNodeRpcServer directly and times with System.nanotime(). I made a
synthetic directory structure with 30,000 directories, each with one file. This
makes it a best case scenario for the batched listing API. Precautions were
taken to allow JVM warmup, I let the benchmarks run for about 30s before
recording with JFR/JMC.
I was able to list 8.4x more LocatedFileStatuses/second with the batched
listing. JMC showed a TLAB allocation rate of 5x. Non-TLAB allocation was
trivial. This means we're much more CPU efficient per-FileStatus, and also
doing less allocation.
Since this did not include RTT time or lock contention from concurrent threads,
a more realistic benchmark might do even better. I think this explains the
10-20x that Todd saw when benchmarking on a real cluster.
> Batch listing of multiple directories
> -------------------------------------
>
> Key: HDFS-13616
> URL: https://issues.apache.org/jira/browse/HDFS-13616
> Project: Hadoop HDFS
> Issue Type: New Feature
> Affects Versions: 3.2.0
> Reporter: Andrew Wang
> Assignee: Andrew Wang
> Priority: Major
> Attachments: HDFS-13616.001.patch
>
>
> One of the dominant workloads for external metadata services is listing of
> partition directories. This can end up being bottlenecked on RTT time when
> partition directories contain a small number of files. This is fairly common,
> since fine-grained partitioning is used for partition pruning by the query
> engines.
> A batched listing API that takes multiple paths amortizes the RTT cost.
> Initial benchmarks show a 10-20x improvement in metadata loading performance.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]