[
https://issues.apache.org/jira/browse/HADOOP-14086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15869708#comment-15869708
]
Steve Loughran commented on HADOOP-14086:
-----------------------------------------
# target version will have to be branch-2+, with backports as people feel
appropriate
# please don't make things worse for object stores. One thing we've started
doing there is massively boost the performance of listFiles(path,
recursive=true), which we can take from being a slow emulation of a recursive
treewalk to an O(1+ files/5000) call. If you could use that to iterate over the
LocatedFileStatus entries, then hand off that status data direct to the
workers, then it'd be great for object stores, while still delivering good NN
perf
> Improve DistCp Speed for small files
> ------------------------------------
>
> Key: HADOOP-14086
> URL: https://issues.apache.org/jira/browse/HADOOP-14086
> Project: Hadoop Common
> Issue Type: Improvement
> Components: tools/distcp
> Affects Versions: 2.6.5
> Reporter: Zheng Shao
> Assignee: Zheng Shao
> Priority: Minor
>
> When using distcp to copy lots of small files, NameNode naturally becomes a
> bottleneck.
> The current distcp code did *not* optimize to reduce the NameNode calls. We
> should restructure the code to reduce the number of NameNode calls as much as
> possible to speed up the copy of small files.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]