[ 
https://issues.apache.org/jira/browse/HDFS-13616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16494293#comment-16494293
 ] 

Tsz Wo Nicholas Sze commented on HDFS-13616:
--------------------------------------------

{code}
Hi Nicholas, thanks for taking a look. Currently we don't see a need for API 
support beyond listing. The workload we're looking at is metadata loading for 
applications like Hive and Impala.
{code}
Batch delete definitely is very useful.

{code}
This batched listing API could also be combined with an async API (or a thread 
pool), so it's not an "either or" situation.
{code}
You are right that it is not "either or", although batch with async is natural.

The batchedListStatusIterator APIs in the patch are too restrictive and have 
problems such as the List<Path> paths parameter can only has a limited size (it 
is not a remote iterator).  How about we support a batch mode?  In batch mode, 
user can submit any file system calls.  All these calls will be sent in batch, 
possibly by multiple calls.

> Batch listing of multiple directories
> -------------------------------------
>
>                 Key: HDFS-13616
>                 URL: https://issues.apache.org/jira/browse/HDFS-13616
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>    Affects Versions: 3.2.0
>            Reporter: Andrew Wang
>            Assignee: Andrew Wang
>            Priority: Major
>         Attachments: BenchmarkListFiles.java, HDFS-13616.001.patch, 
> HDFS-13616.002.patch
>
>
> One of the dominant workloads for external metadata services is listing of 
> partition directories. This canĀ end up being bottlenecked on RTT time when 
> partition directories contain a small number of files. This is fairly common, 
> since fine-grained partitioning is used for partition pruning by the query 
> engines.
> A batched listing API that takes multiple paths amortizes the RTT cost. 
> Initial benchmarks show a 10-20x improvement in metadata loading performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to