distcp leaves empty blocks afte successful execution
----------------------------------------------------
Key: HADOOP-3294
URL: https://issues.apache.org/jira/browse/HADOOP-3294
Project: Hadoop Core
Issue Type: Bug
Affects Versions: 0.16.3
Environment: 0.16.3 without any patches. Dfs permissions turned off
everywhere, such that HADOOP-3138 and HADOOP-3186 do not apply
Reporter: Christian Kunz
I copied around 40 TB between two hadoop clusters, with distcp running on
source.
Job was *successful*, but one destination file was empty because of its only
block being empty.
None of the distcp log files have any mentioning of this file.
There were a couple of messages in the namenode server log of the destination
cluster referencing the file:
hadoop-xxxnamenode-yyy.log.2008-04-19:2008-04-19 02:19:15,666 INFO
org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.allocateBlock:
destinationDir/_distcp_tmp_z0g93p/fileName. blk_-9209890281741927376
hadoop-xxx-namenode-yyy.log.2008-04-19:2008-04-19 02:54:45,820 WARN
org.apache.hadoop.dfs.StateChange: DIR* NameSystem.internalReleaseCreate:
attempt to release a create lock on destinationDir/_distcp_tmp_z0g93p/fileName
file does not exist.
distcp should not rely on the user to double-check.
Would it make sense to add a reducer to compare destination file sizes with
source files sizes and do some appropriate action?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.