[jira] [Commented] (HADOOP-9475) Distcp issue

Steve Loughran (JIRA) Sat, 13 Apr 2013 07:32:16 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-9475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13631062#comment-13631062
 ]


Steve Loughran commented on HADOOP-9475:
----------------------------------------

I'm afraid I'm gong to have to close this as an invalid issue unless you can 
show that there's a bug in distCP that surfaces on your infrastructure

http://wiki.apache.org/hadoop/InvalidJiraIssues

# DistCp works for everybody else, fast enough to bring down network links 
between sites if you aren't careful.
# It is implemented as an MR job run on the source cluster, where mappers copy 
files.

If it doesn't work for you, then there's probably something wrong with your 
cluster or network
* the bandwidth between clusters is lower than you expect
* you are limited by the no. of mappers you can run with distCP (distcp conf or 
cluster setup)
* you have lots and lots of small files

If you can show that your cluster and the network has the capacity to copy 
large files (hint: use one of the many linux command line network bandwidth 
test tools to measure that bandwidth before going near Hadoop), then consider 
filing a bug. Even there, as nobody else is seeing it, you are going to have to 
be the person to debug & fix it. 
                
> Distcp issue
> ------------
>
>                 Key: HADOOP-9475
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9475
>             Project: Hadoop Common
>          Issue Type: Bug
>            Reporter: Sambit Sahoo
>
> 2013-04-13 05:11:43,327 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded 
> the native-hadoop library
> 2013-04-13 05:11:43,439 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
> Initializing JVM Metrics with processName=MAP, sessionId=
> 2013-04-13 05:11:43,750 INFO org.apache.hadoop.mapred.MapTask: 
> numReduceTasks: 0
> 2013-04-13 05:11:43,981 INFO org.apache.hadoop.io.compress.zlib.ZlibFactory: 
> Successfully loaded & initialized native-zlib library
> 2013-04-13 05:31:08,282 INFO org.apache.hadoop.mapred.Task: 
> Task:attempt_201302011155_224614_m_000009_0 is done. And is in the process of 
> commiting
> 2013-04-13 05:31:09,359 INFO org.apache.hadoop.mapred.Task: Task 
> attempt_201302011155_224614_m_000009_0 is allowed to commit now
> 2013-04-13 05:31:09,937 INFO org.apache.hadoop.mapred.FileOutputCommitter: 
> Saved output of task 'attempt_201302011155_224614_m_000009_0' to 
> /tmp/_distcp_logs_spti36
> 2013-04-13 05:31:09,939 INFO org.apache.hadoop.mapred.Task: Task 
> 'attempt_201302011155_224614_m_000009_0' done.
> 2013-04-13 05:31:09,942 INFO org.apache.hadoop.mapred.TaskLogsTruncater: 
> Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
> I am facing some delay during disctcp from one cluster to another.
> Here i am copying snappy compressed data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-9475) Distcp issue

Reply via email to