[ 
https://issues.apache.org/jira/browse/HADOOP-14086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15886105#comment-15886105
 ] 

Erik Krogen commented on HADOOP-14086:
--------------------------------------

[~zhz] currently there are multiple calls made for each file; even reducing a 
distcp for 1M files to 1M {{getFileInfo}} calls would be a big improvement over 
the current implementation.

[[email protected]], what about this JIRA makes you worry that object store 
performance will be worse? Nothing stands out to me so I am curious. Also, are 
you saying that the listFiles performance work is already done, or under 
progress? Do you have a JIRA link?

> Improve DistCp Speed for small files
> ------------------------------------
>
>                 Key: HADOOP-14086
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14086
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: tools/distcp
>    Affects Versions: 2.6.5
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>            Priority: Minor
>
> When using distcp to copy lots of small files,  NameNode naturally becomes a 
> bottleneck.
> The current distcp code did *not* optimize to reduce the NameNode calls.  We 
> should restructure the code to reduce the number of NameNode calls as much as 
> possible to speed up the copy of small files.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to