[
https://issues.apache.org/jira/browse/HDFS-13398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474335#comment-16474335
]
Ajay Sachdev commented on HDFS-13398:
-------------------------------------
Hi Yiqun,
Thanks for taking the time to take a look at the patch. I have also uploaded
the 2nd version of the patch against apache trunk. I would appreciate if you
could provide some feedback against that version as well. This patch is
intended for our customer which is using ViPRFS interface and they only want to
run -ls -R/-du and count commands of FsShell. I agree that if there is any
mutable shared state that any of sub-commands such as usagesTable in
FsUsage.java class then we will wrap synchronized block around it to make it
thread-safe. As a long term solution we would like to optimize all FsShell
commands to take use of this ForkJoin Pool Framework and then all sub-commands
implementation can ensure that any shared state are thread-safe.
Thanks
Ajay
> Hdfs recursive listing operation is very slow
> ---------------------------------------------
>
> Key: HDFS-13398
> URL: https://issues.apache.org/jira/browse/HDFS-13398
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: hdfs
> Affects Versions: 2.7.1
> Environment: HCFS file system where HDP 2.6.1 is connected to ECS
> (Object Store).
> Reporter: Ajay Sachdev
> Assignee: Ajay Sachdev
> Priority: Major
> Fix For: 2.7.1
>
> Attachments: HDFS-13398.001.patch, HDFS-13398.002.patch,
> parallelfsPatch
>
>
> The hdfs dfs -ls -R command is sequential in nature and is very slow for a
> HCFS system. We have seen around 6 mins for 40K directory/files structure.
> The proposal is to use multithreading approach to speed up recursive list, du
> and count operations.
> We have tried a ForkJoinPool implementation to improve performance for
> recursive listing operation.
> [https://github.com/jasoncwik/hadoop-release/tree/parallel-fs-cli]
> commit id :
> 82387c8cd76c2e2761bd7f651122f83d45ae8876
> Another implementation is to use Java Executor Service to improve performance
> to run listing operation in multiple threads in parallel. This has
> significantly reduced the time to 40 secs from 6 mins.
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]