[ 
https://issues.apache.org/jira/browse/HADOOP-16872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liuxiaolong updated HADOOP-16872:
---------------------------------
    Attachment: optimise after.png

> Performance improvement when distcp files in large dir with -direct option
> --------------------------------------------------------------------------
>
>                 Key: HADOOP-16872
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16872
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: liuxiaolong
>            Priority: Major
>         Attachments: HADOOP-16872.001.patch, optimise after.png, optimise 
> before.png
>
>
> We use distcp with -direct option to copy a file between two large 
> directories. We found it costed a few minutes. If we launch too much distcp 
> jobs at the same time, NameNode  performance degradation is serious.
> hadoop -direct -skipcrccheck -update -prbugaxt -i -numListstatusThreads 1 
> hdfs://cluster1:8020/source/100.log  hdfs://cluster2:8020/target/100.jpg
> || ||Dir path||Count||
> ||Source dir||  hdfs://cluster1:8020/source/ ||100k+ files||
> ||Target dir||hdfs://cluster2:8020/target/ ||100k+  files||
>  
> Check code in CopyCommitter.java, we find in function
> deleteAttemptTempFiles() has a code targetFS.globStatus(new 
> Path(targetWorkPath, ".distcp.tmp." + jobId.replaceAll("job","attempt") + 
> "*")); 
> It will waste a lot of time when distcp between two large dirs. When we use 
> distcp with -direct option,  it will direct write to the target file without 
> generate a  '.distcp.tmp'  temp file. So, i think this code need add a 
> judgment in function deleteAttemptTempFiles, if distcp with -direct option, 
> do nothing , directly return .  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to