[
https://issues.apache.org/jira/browse/MAPREDUCE-648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tsz Wo (Nicholas), SZE updated MAPREDUCE-648:
---------------------------------------------
Description:
h4. 1. distcp -update launches job when there is at least one dir in source
paths to be copied, even though there is nothing to copy.
HADOOP-5675 added fileCount > 0 to be checked to decide whether to launch job.
And HADOOP-5762 changed this to fileCount + dirCount > 0 to solve the issue of
empty directories not getting copied to destination. With -update, dirCount is
incremented without checking if that dir already exists at the destination. So
distcp job is launched because of dirCount > 0 even though there is nothing to
copy. Incrementing dirCount can be skipped if that dir already exists at the
destination in case of -update.
h4. 2. distcp doesn't skip copying file when we do -update on single file if
the destfile already exists.
When we do
hadoop distcp -update srcfilename destfilename
it seems to be comparing checksums of srcfilename and destfilename/srcfilename
and so skip is not done. It should compare checksums of srcfilename and
destfilename.
See also MAPREDUCE-644.
was:
distcp -update launches job when there is at least one dir in source paths to
be copied, even though there is nothing to copy.
HADOOP-5675 added fileCount > 0 to be checked to decide whether to launch job.
And HADOOP-5762 changed this to fileCount + dirCount > 0 to solve the issue of
empty directories not getting copied to destination. With -update, dirCount is
incremented without checking if that dir already exists at the destination. So
distcp job is launched because of dirCount > 0 even though there is nothing to
copy. Incrementing dirCount can be skipped if that dir already exists at the
destination in case of -update.
Summary: Two distcp bugs (was: distcp -update launches job when there
is at least one dir in source paths to be copied, even though there is nothing
to copy)
Included the description of MAPREDUCE-644.
> Two distcp bugs
> ---------------
>
> Key: MAPREDUCE-648
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-648
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: distcp
> Reporter: Ravi Gummadi
> Assignee: Ravi Gummadi
> Priority: Minor
> Fix For: 0.21.0
>
> Attachments: d_648_644.patch, d_dirCount648.patch,
> d_dirCount648.v1.patch, d_dirCount_648.patch
>
>
> h4. 1. distcp -update launches job when there is at least one dir in source
> paths to be copied, even though there is nothing to copy.
> HADOOP-5675 added fileCount > 0 to be checked to decide whether to launch
> job. And HADOOP-5762 changed this to fileCount + dirCount > 0 to solve the
> issue of empty directories not getting copied to destination. With -update,
> dirCount is incremented without checking if that dir already exists at the
> destination. So distcp job is launched because of dirCount > 0 even though
> there is nothing to copy. Incrementing dirCount can be skipped if that dir
> already exists at the destination in case of -update.
> h4. 2. distcp doesn't skip copying file when we do -update on single file if
> the destfile already exists.
> When we do
> hadoop distcp -update srcfilename destfilename
> it seems to be comparing checksums of srcfilename and
> destfilename/srcfilename and so skip is not done. It should compare checksums
> of srcfilename and destfilename.
> See also MAPREDUCE-644.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.