[
https://issues.apache.org/jira/browse/HADOOP-14086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15886105#comment-15886105
]
Erik Krogen commented on HADOOP-14086:
--------------------------------------
[~zhz] currently there are multiple calls made for each file; even reducing a
distcp for 1M files to 1M {{getFileInfo}} calls would be a big improvement over
the current implementation.
[[email protected]], what about this JIRA makes you worry that object store
performance will be worse? Nothing stands out to me so I am curious. Also, are
you saying that the listFiles performance work is already done, or under
progress? Do you have a JIRA link?
> Improve DistCp Speed for small files
> ------------------------------------
>
> Key: HADOOP-14086
> URL: https://issues.apache.org/jira/browse/HADOOP-14086
> Project: Hadoop Common
> Issue Type: Improvement
> Components: tools/distcp
> Affects Versions: 2.6.5
> Reporter: Zheng Shao
> Assignee: Zheng Shao
> Priority: Minor
>
> When using distcp to copy lots of small files, NameNode naturally becomes a
> bottleneck.
> The current distcp code did *not* optimize to reduce the NameNode calls. We
> should restructure the code to reduce the number of NameNode calls as much as
> possible to speed up the copy of small files.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]