[
https://issues.apache.org/jira/browse/HDFS-10234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16139334#comment-16139334
]
Xiaoyu Yao edited comment on HDFS-10234 at 8/24/17 12:03 AM:
-------------------------------------------------------------
[~linyiqun], thanks for the patch. I take a look at the v2 patch here are my
comments:
1. We need replace the COPY count update in createTargetDirsWithRetry with a
DIR_COPY count update. The change to update DIR_COPY counter in map() can be
removed after that. Without that, [[email protected]]'s earlier
comment are not fully addressed.
{code}
@@ -260,7 +268,7 @@ private void createTargetDirsWithRetry(String description,
} catch (Exception e) {
throw new IOException("mkdir failed for " + target, e);
}
- incrementCounter(context, Counter.COPY, 1);
+ incrementCounter(context, Counter.DIR_COPY, 1);
}
{code}
2. Can we include both the source (path, size) and destination (path, size) in
the SKIP/COPY log inside map()? The information is available from
sourceCurrStatus and targetStatus there. This way, many applications can just
parse the distcp log offline to get information without adding extra load on
namenode.
3. We will need a switch (e.g., -v) to enable these additional log output for
backward compatibility. By default, the log only contains the information as it
is today.
was (Author: xyao):
[~linyiqun], thanks for the patch. I take a look at the v2 patch here are my
comments:
1. We need replace the COPY count update in createTargetDirsWithRetry with a
DIR_COPY count update. The change to update DIR_COPY counter in map() can be
removed after that. Without that, [[email protected]]'s earlier
comment are not fully addressed.
{code}
@@ -260,7 +268,7 @@ private void createTargetDirsWithRetry(String description,
} catch (Exception e) {
throw new IOException("mkdir failed for " + target, e);
}
- incrementCounter(context, Counter.COPY, 1);
+ incrementCounter(context, Counter.DIR_COPY, 1);
}
{code}
2. Can we include both the source (path, size) and destination (path, size) in
the SKIP/COPY log inside map()? The information is available from
sourceCurrStatus and targetStatus there. This way, many applications can just
parse the distcp log offline to get information without adding extra load on
namenode.
> DistCp log output should contain copied and deleted files and directories
> -------------------------------------------------------------------------
>
> Key: HDFS-10234
> URL: https://issues.apache.org/jira/browse/HDFS-10234
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: distcp
> Affects Versions: 2.7.1
> Reporter: Konstantin Shaposhnikov
> Assignee: Yiqun Lin
> Attachments: HDFS-10234.001.patch, HDFS-10234.002.patch
>
>
> DistCp log output (specified via {{-log}} command line option) currently
> contains only skipped and failed (when failures are ignored via {{-i}}) files.
> It will be more useful if it also contains copied and deleted files and
> created directories.
> This should be fixed in
> https://github.com/apache/hadoop/blob/branch-2.7.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyMapper.java
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]