[ 
https://issues.apache.org/jira/browse/HDFS-13616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16494327#comment-16494327
 ] 

Todd Lipcon commented on HDFS-13616:
------------------------------------

bq. In batch mode, user can submit any file system calls. All these calls will 
be sent in batch, possibly by multiple calls.

Can you clarify how that would work with an API example? For example, if you 
are the Hive planner and you have a List<Path> partitionDirs, and your end goal 
is to come up with a Map<Path, List<Path>> partitionToFiles, how would you use 
the API you're proposing?

> Batch listing of multiple directories
> -------------------------------------
>
>                 Key: HDFS-13616
>                 URL: https://issues.apache.org/jira/browse/HDFS-13616
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>    Affects Versions: 3.2.0
>            Reporter: Andrew Wang
>            Assignee: Andrew Wang
>            Priority: Major
>         Attachments: BenchmarkListFiles.java, HDFS-13616.001.patch, 
> HDFS-13616.002.patch
>
>
> One of the dominant workloads for external metadata services is listing of 
> partition directories. This canĀ end up being bottlenecked on RTT time when 
> partition directories contain a small number of files. This is fairly common, 
> since fine-grained partitioning is used for partition pruning by the query 
> engines.
> A batched listing API that takes multiple paths amortizes the RTT cost. 
> Initial benchmarks show a 10-20x improvement in metadata loading performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to