[
https://issues.apache.org/jira/browse/HADOOP-11785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14391095#comment-14391095
]
Hadoop QA commented on HADOOP-11785:
------------------------------------
{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12708737/distcp-liststatus.patch
against trunk revision 4922394.
{color:green}+1 @author{color}. The patch does not contain any @author
tags.
{color:red}-1 tests included{color}. The patch doesn't appear to include
any new or modified tests.
Please justify why no new tests are needed for this
patch.
Also please list what manual steps were performed to
verify this patch.
{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.
{color:green}+1 javadoc{color}. There were no new javadoc warning messages.
{color:green}+1 eclipse:eclipse{color}. The patch built with
eclipse:eclipse.
{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 2.0.3) warnings.
{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.
{color:green}+1 core tests{color}. The patch passed unit tests in
hadoop-tools/hadoop-distcp.
Test results:
https://builds.apache.org/job/PreCommit-HADOOP-Build/6041//testReport/
Console output:
https://builds.apache.org/job/PreCommit-HADOOP-Build/6041//console
This message is automatically generated.
> Reduce number of listStatus operation in distcp buildListing()
> --------------------------------------------------------------
>
> Key: HADOOP-11785
> URL: https://issues.apache.org/jira/browse/HADOOP-11785
> Project: Hadoop Common
> Issue Type: Improvement
> Components: tools/distcp
> Affects Versions: 3.0.0
> Reporter: Zoran Dimitrijevic
> Assignee: Zoran Dimitrijevic
> Priority: Minor
> Attachments: distcp-liststatus.patch
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> Distcp was taking long time in copyListing.buildListing() for large source
> trees (I was using source of 1.5M files in a tree of about 50K directories).
> For input at s3 buildListing was taking more than one hour. I've noticed a
> performance bug in the current code which does listStatus twice for each
> directory which doubles number of RPCs in some cases (if most directories do
> not contain >1000 files).
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)