[
https://issues.apache.org/jira/browse/HADOOP-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12592144#action_12592144
]
Chris Douglas commented on HADOOP-3294:
---------------------------------------
bq. Perhaps we should have a distcp 'sync' mode where it first checks if each
source and target have the same length and/or date and skips copying when they
do.
This is already in distcp as {{-update}}. Its semantics are a little odd- it
assumes that the src tree matches the destination rather than the usual cp
semantics- but it overwrites the destination file iff its size is different
than the source file.
+1
> distcp leaves empty blocks afte successful execution
> ----------------------------------------------------
>
> Key: HADOOP-3294
> URL: https://issues.apache.org/jira/browse/HADOOP-3294
> Project: Hadoop Core
> Issue Type: Bug
> Components: util
> Affects Versions: 0.16.3
> Environment: 0.16.3 without any patches. Dfs permissions turned off
> everywhere, such that HADOOP-3138 and HADOOP-3186 do not apply
> Reporter: Christian Kunz
> Assignee: Tsz Wo (Nicholas), SZE
> Attachments: 3294_20080423.patch, 3294_20080423b.patch
>
>
> I copied around 40 TB between two hadoop clusters, with distcp running on
> source.
> Job was *successful*, but one destination file was empty because of its only
> block being empty.
> None of the distcp log files have any mentioning of this file.
> There were a couple of messages in the namenode server log of the destination
> cluster referencing the file:
> hadoop-xxxnamenode-yyy.log.2008-04-19:2008-04-19 02:19:15,666 INFO
> org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.allocateBlock:
> destinationDir/_distcp_tmp_z0g93p/fileName. blk_-9209890281741927376
> hadoop-xxx-namenode-yyy.log.2008-04-19:2008-04-19 02:54:45,820 WARN
> org.apache.hadoop.dfs.StateChange: DIR* NameSystem.internalReleaseCreate:
> attempt to release a create lock on
> destinationDir/_distcp_tmp_z0g93p/fileName file does not exist.
> distcp should not rely on the user to double-check.
> Would it make sense to add a reducer to compare destination file sizes with
> source files sizes and do some appropriate action?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.