[ 
https://issues.apache.org/jira/browse/HDFS-13616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16489969#comment-16489969
 ] 

Todd Lipcon commented on HDFS-13616:
------------------------------------

[~zhz] My feeling on these sorts of APIs is that a user who wants to list a 
bunch of directories is just as likely to do so whether provided with a 
'batchListDirectories(List<Path>)' API as they are likely to do so with an 
equiavalent for loop. In particular, applications like MR, Hive, Impala, 
Presto, etc, end up needing this workflow in order to collect all the input 
paths from a list of partition directories, so will do this whether we provide 
a specific API or not.

Our belief is that with a batch API we have a better chance of optimizing this 
common pattern vs a bunch of separate API calls. For example, the various 
amortization benefits mentioned above. If we eventually add compression of RPC 
responses, we also get benefit by having larger responses with repeated 
substrings vs a bunch of smaller responses.

I just collected some numbers comparing three options for Impala fetching 
partition directory contents in order to plan a 'select *' from a large table. 
The table has 2181 partitions containing a total of 321,008 files. I'm testing 
against a 2.x branch build with this patch applied, and measuring CPU 
consumption of the NN for the total of fetching all file block locations from 
these 2181 directories. No other work is targeting this NN, and the NN is about 
2ms away from the host doing the planning.
||Method||User CPU (sec)||System CPU (sec)||Total CPU (sec)||
|Non-batched (1 thread)|5.95|0.30|6.25|
|Non-batched (5 threads)|6.25|0.32|6.57|
|Batched (1 thread)|5.93|0.21|6.14|

The end-to-end planning time of the batched approach is not as good as the 
5-thread non-batched, but noticeably faster than the single-threaded 
non-batched. And the total CPU consumption is a few percent lower (especially 
system CPU). Note that this particular table isn't the optimal case for 
batching since the average partition has 147 files and thus each round trip can 
only fetch a few partitions worth of info. I'll try to gather some data on a 
table where the average partition doesn't have so many files as well, where 
we'd expect the benefits to be larger.

 

> Batch listing of multiple directories
> -------------------------------------
>
>                 Key: HDFS-13616
>                 URL: https://issues.apache.org/jira/browse/HDFS-13616
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>    Affects Versions: 3.2.0
>            Reporter: Andrew Wang
>            Assignee: Andrew Wang
>            Priority: Major
>         Attachments: HDFS-13616.001.patch
>
>
> One of the dominant workloads for external metadata services is listing of 
> partition directories. This can end up being bottlenecked on RTT time when 
> partition directories contain a small number of files. This is fairly common, 
> since fine-grained partitioning is used for partition pruning by the query 
> engines.
> A batched listing API that takes multiple paths amortizes the RTT cost. 
> Initial benchmarks show a 10-20x improvement in metadata loading performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to