[ 
https://issues.apache.org/jira/browse/HDFS-13398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474010#comment-16474010
 ] 

Yiqun Lin commented on HDFS-13398:
----------------------------------

Hi [~ajaysachdev], I just take a quick look for the v001 patch. One comment 
from me:

In the v001 patch, we introduced the ForkJoinPool to execute commands in an 
asyn way. Since we make this change in Command class that means this will also 
make sense for other fs sub-commands, not only -du, -ls commands. Will this be 
thread-safe to run in other fs commands?

In addition, looks like this JIRA should move into Hadoop Common.

> Hdfs recursive listing operation is very slow
> ---------------------------------------------
>
>                 Key: HDFS-13398
>                 URL: https://issues.apache.org/jira/browse/HDFS-13398
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs
>    Affects Versions: 2.7.1
>         Environment: HCFS file system where HDP 2.6.1 is connected to ECS 
> (Object Store).
>            Reporter: Ajay Sachdev
>            Assignee: Ajay Sachdev
>            Priority: Major
>             Fix For: 2.7.1
>
>         Attachments: HDFS-13398.001.patch, parallelfsPatch
>
>
> The hdfs dfs -ls -R command is sequential in nature and is very slow for a 
> HCFS system. We have seen around 6 mins for 40K directory/files structure.
> The proposal is to use multithreading approach to speed up recursive list, du 
> and count operations.
> We have tried a ForkJoinPool implementation to improve performance for 
> recursive listing operation.
> [https://github.com/jasoncwik/hadoop-release/tree/parallel-fs-cli]
> commit id : 
> 82387c8cd76c2e2761bd7f651122f83d45ae8876
> Another implementation is to use Java Executor Service to improve performance 
> to run listing operation in multiple threads in parallel. This has 
> significantly reduced the time to 40 secs from 6 mins.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to