[
https://issues.apache.org/jira/browse/HADOOP-2725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris Douglas reopened HADOOP-2725:
-----------------------------------
I reverted this patch, because its test case (TestCopyFiles) took nearly 400s
(from 26s) on my machine, due to a silently failing local-to-local test case.
All 20 files copy successfully, but fail in the rename:
{noformat}
2008-02-08 18:36:14,246 INFO util.CopyFiles (CopyFiles.java:map(390)) - FAIL
2522487525519213817 : java.io.IOException: Fail to rename tmp file
(=file:/path/build/test/data/destdat/_distcp_tmp_cq5yoa/25224875255192138
17) to destination file
(=file:/path/build/test/data/destdat/2522487525519213817)
at
org.apache.hadoop.util.CopyFiles$FSCopyFilesMapper.rename(CopyFiles.java:336)
at
org.apache.hadoop.util.CopyFiles$FSCopyFilesMapper.copy(CopyFiles.java:317)
at
org.apache.hadoop.util.CopyFiles$FSCopyFilesMapper.map(CopyFiles.java:382)
at
org.apache.hadoop.util.CopyFiles$FSCopyFilesMapper.map(CopyFiles.java:202)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:208)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:132)
Caused by: java.io.IOException: Target
file:/path/build/test/data/destdat/.2522487525519213817.crc already exists
at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:269)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:142)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:133)
at
org.apache.hadoop.fs.RawLocalFileSystem.rename(RawLocalFileSystem.java:211)
at
org.apache.hadoop.fs.ChecksumFileSystem.rename(ChecksumFileSystem.java:403)
at
org.apache.hadoop.util.CopyFiles$FSCopyFilesMapper.rename(CopyFiles.java:333)
... 6 more
{noformat}
At a glance, this looks like a problem in LocalFileSystem, but I'm reverting
this patch for now.
> Distcp truncates some files when copying
> ----------------------------------------
>
> Key: HADOOP-2725
> URL: https://issues.apache.org/jira/browse/HADOOP-2725
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs, util
> Affects Versions: 0.16.0
> Environment: Nightly build:
> http://hadoopqa.yst.corp.yahoo.com:8080/hudson/job/Hadoop-LinuxTest/770/
> With patches for HADOOP-2095 and HADOOP-2119.
> Reporter: Murtaza A. Basrai
> Assignee: Tsz Wo (Nicholas), SZE
> Priority: Critical
> Fix For: 0.16.1
>
> Attachments: 2725_20080206.patch, 2725_20080208.patch
>
>
> We used distcp to copy ~100 TB of data across two clusters ~1400 nodes each.
> Command used (it was run on the src cluster):
> hadoop distcp -log /logdir/logfile hdfs://src-namenode:8600//src-dir-1
> hdfs://src-namenode:8600//src-dir-2 ... hdfs://src-namenode:8600//src-dir-n
> hdfs://tgt-namenode:8600//dst-dir
> Distcp completed without errors, but when we checked the file sizes on the
> src and tgt clusters, we noticed differences in file sizes for 9 files (~6
> GB).
> src-file-1 666762714 bytes -> tgt-file-1 134217728 bytes
> src-file-2 673791814 bytes -> tgt-file-2 536870912 bytes
> src-file-3 692172075 bytes -> tgt-file-3 0 bytes
> All target files are truncated at block boundaries (some have 0 size).
> I looked at the log files, and noticed a few things:
> 1. There are 31059 log files (same as the number of Maps the job had).
> 2. 246 of the log files are non-empty.
> 3. All non-empty log files are of the form:
> SKIP: hdfs://src-namenode/src-dir-a/src-file-x
> SKIP: hdfs://src-namenode/src-dir-b/src-file-y
> SKIP: hdfs://src-namenode/src-dir-c/src-file-z
> 4. All 9 files which were truncated were included in the log files as skipped
> files.
> 5. All 9 files were the last entry in their respective log files.
> e.g.
> Non-empty logfile 1:
> SKIP: hdfs://src-namenode/src-dir-a/src-file-x
> SKIP: hdfs://src-namenode/src-dir-b/src-file-y
> SKIP: hdfs://src-namenode/src-dir-c/src-file-z <-- Truncated file
> Non_empty logfile 2:
> SKIP: hdfs://src-namenode/src-dir-p/src-file-m
> SKIP: hdfs://src-namenode/src-dir-q/src-file-n <-- Truncated file
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.