[ https://issues.apache.org/jira/browse/MAPREDUCE-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832541#action_12832541 ]
Hadoop QA commented on MAPREDUCE-1305: -------------------------------------- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12435423/M1305-2.patch against trunk revision 908321. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/441/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/441/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/441/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/441/console This message is automatically generated. > Running distcp with -delete incurs avoidable penalties > ------------------------------------------------------ > > Key: MAPREDUCE-1305 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1305 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: distcp > Affects Versions: 0.20.1 > Reporter: Peter Romianowski > Assignee: Peter Romianowski > Attachments: M1305-1.patch, M1305-2.patch, MAPREDUCE-1305.patch > > > *First problem* > In org.apache.hadoop.tools.DistCp#deleteNonexisting we serialize FileStatus > objects when the path is all we need. > The performance problem comes from > org.apache.hadoop.fs.RawLocalFileSystem.RawLocalFileStatus#write which tries > to retrieve file permissions by issuing a "ls -ld <path>" which is painfully > slow. > Changed that to just serialize Path and not FileStatus. > *Second problem* > To delete the files we invoke the "hadoop" command line tool with option > "-rmr <path>". Again, for each file. > Changed that to dstfs.delete(path, true) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.