[ https://issues.apache.org/jira/browse/HADOOP-6051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720902#action_12720902 ]
Tsz Wo (Nicholas), SZE commented on HADOOP-6051: ------------------------------------------------ >hadoop distcp -update srcfilename destfilename > >it seems to be comparing checksums of srcfilename and destfilename/srcfilename >and so skip is not done. It should compare checksums of srcfilename and >destfilename. Actually, this is the correct behavior according to the [doc|http://hadoop.apache.org/core/docs/r0.20.0/distcp.html]. Quoted -update description below: {quote} As noted in the preceding, this is not a "sync" operation. The only criterion examined is the source and destination file sizes; if they differ, the source file replaces the destination file. As discussed in the [following|http://hadoop.apache.org/core/docs/r0.20.0/distcp.html#uo], it also changes the semantics for generating destination paths, so users should use this carefully. {quote} > distcp does not skip copying file if we are updating single file > ---------------------------------------------------------------- > > Key: HADOOP-6051 > URL: https://issues.apache.org/jira/browse/HADOOP-6051 > Project: Hadoop Core > Issue Type: Bug > Components: tools/distcp > Affects Versions: 0.21.0 > Reporter: Ravi Gummadi > Fix For: 0.21.0 > > > distcp doesn't skip copying file when we do -update on single file if the > destfile already exists. > When we do > hadoop distcp -update srcfilename destfilename > it seems to be comparing checksums of srcfilename and > destfilename/srcfilename and so skip is not done. It should compare checksums > of srcfilename and destfilename. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.