[
https://issues.apache.org/jira/browse/HADOOP-14698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16436296#comment-16436296
]
Jason Cwik commented on HADOOP-14698:
-------------------------------------
As mentioned above in
https://issues.apache.org/jira/browse/HADOOP-14698?focusedCommentId=16107552&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16107552
the current threading model only works for the leaf nodes. In deep/wide tree
structures, the enumeration can take a significant amount of time itself,
especially when using other FileSystem implementations like S3A or other object
store connectors. I started a patch in HDFS-13398 to address this (especially
for `ls` or `du` commands) but it could likely be combined with this effort to
parallelize the FsShell module in general.
So far, we've tried two approaches. The first simply creates another executor
in the base class and enqueues the child operations in processPaths. The
second approach uses ForkJoinPool to crawl the tree and process subtrees in
parallel. Currently, we have FJP working with `ls` and `du`, but not other
operations. I think that FJP is the best route since we could do things like
wait to delete a directory until all its children have been deleted, but in
order to do this properly it might require a significant refactoring of the
whole FsShell module to implement the correct ForkJoinTask structure.
Thoughts?
> Make copyFromLocal's -t option available for put as well
> --------------------------------------------------------
>
> Key: HADOOP-14698
> URL: https://issues.apache.org/jira/browse/HADOOP-14698
> Project: Hadoop Common
> Issue Type: Improvement
> Reporter: Andras Bokor
> Assignee: Andras Bokor
> Priority: Major
> Attachments: HADOOP-14698.01.patch, HADOOP-14698.02.patch,
> HADOOP-14698.03.patch, HADOOP-14698.04.patch, HADOOP-14698.05.patch,
> HADOOP-14698.06.patch, HADOOP-14698.07.patch, HADOOP-14698.08.patch
>
>
> After HDFS-11786 copyFromLocal and put are no longer identical.
> I do not see any reason why not to add the new feature to put as well.
> Being non-identical makes the understanding/usage of command more complicated
> from user point of view.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]