MAPREDUCE-6471. Document distcp incremental copy. Contributed by Neelesh 
Srinivas Salian.


Project: http://git-wip-us.apache.org/repos/asf/hadoop/repo
Commit: http://git-wip-us.apache.org/repos/asf/hadoop/commit/66dad854
Tree: http://git-wip-us.apache.org/repos/asf/hadoop/tree/66dad854
Diff: http://git-wip-us.apache.org/repos/asf/hadoop/diff/66dad854

Branch: refs/heads/HDFS-7240
Commit: 66dad854c0aea8c137017fcf198b165cc1bd8bdd
Parents: 1c030c6
Author: Harsh J <ha...@cloudera.com>
Authored: Mon Sep 28 13:12:43 2015 +0530
Committer: Harsh J <ha...@cloudera.com>
Committed: Mon Sep 28 13:12:43 2015 +0530

----------------------------------------------------------------------
 hadoop-mapreduce-project/CHANGES.txt                      | 3 +++
 hadoop-tools/hadoop-distcp/src/site/markdown/DistCp.md.vm | 5 ++++-
 2 files changed, 7 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/hadoop/blob/66dad854/hadoop-mapreduce-project/CHANGES.txt
----------------------------------------------------------------------
diff --git a/hadoop-mapreduce-project/CHANGES.txt 
b/hadoop-mapreduce-project/CHANGES.txt
index b7e9016..67adcbd 100644
--- a/hadoop-mapreduce-project/CHANGES.txt
+++ b/hadoop-mapreduce-project/CHANGES.txt
@@ -295,6 +295,9 @@ Release 2.8.0 - UNRELEASED
 
   IMPROVEMENTS
 
+    MAPREDUCE-6471. Document distcp incremental copy
+    (Neelesh Srinivas Salian via harsh)
+
     MAPREDUCE-5045. UtilTest#isCygwin method appears to be unused
     (Neelesh Srinivas Salian via harsh)
 

http://git-wip-us.apache.org/repos/asf/hadoop/blob/66dad854/hadoop-tools/hadoop-distcp/src/site/markdown/DistCp.md.vm
----------------------------------------------------------------------
diff --git a/hadoop-tools/hadoop-distcp/src/site/markdown/DistCp.md.vm 
b/hadoop-tools/hadoop-distcp/src/site/markdown/DistCp.md.vm
index 7424267..aacf4c7 100644
--- a/hadoop-tools/hadoop-distcp/src/site/markdown/DistCp.md.vm
+++ b/hadoop-tools/hadoop-distcp/src/site/markdown/DistCp.md.vm
@@ -189,7 +189,9 @@ $H3 Update and Overwrite
   because it doesn't exist at the target. `10` and `20` are overwritten since
   the contents don't match the source.
 
-  If `-update` is used, `1` is overwritten as well.
+  If `-update` is used, `1` is skipped because the file-length and contents 
match. `2` is copied because it doesn’t exist at the target. `10` and `20` 
are overwritten since the contents don’t match the source. However, if 
`-append` is additionally used, then only `10` is overwritten (source length 
less than destination) and `20` is appended with the change in file (if the 
files match up to the destination's original length).
+
+  If `-overwrite` is used, `1` is overwritten as well.
 
 $H3 raw Namespace Extended Attribute Preservation
 
@@ -222,6 +224,7 @@ Flag              | Description                          | 
Notes
 `-m <num_maps>` | Maximum number of simultaneous copies | Specify the number 
of maps to copy data. Note that more maps may not necessarily improve 
throughput.
 `-overwrite` | Overwrite destination | If a map fails and `-i` is not 
specified, all the files in the split, not only those that failed, will be 
recopied. As discussed in the Usage documentation, it also changes the 
semantics for generating destination paths, so users should use this carefully.
 `-update` | Overwrite if source and destination differ in size, blocksize, or 
checksum | As noted in the preceding, this is not a "sync" operation. The 
criteria examined are the source and destination file sizes, blocksizes, and 
checksums; if they differ, the source file replaces the destination file. As 
discussed in the Usage documentation, it also changes the semantics for 
generating destination paths, so users should use this carefully.
+`-append` | Incremental copy of file with same name but different length | If 
the source file is greater in length than the destination file, the checksum of 
the common length part is compared. If the checksum matches, only the 
difference is copied using read and append functionalities. The -append option 
only works with `-update` without `-skipcrccheck`
 `-f <urilist_uri>` | Use list at \<urilist_uri\> as src list | This is 
equivalent to listing each source on the command line. The `urilist_uri` list 
should be a fully qualified URI.
 `-filelimit <n>` | Limit the total number of files to be <= n | 
**Deprecated!** Ignored in the new DistCp.
 `-sizelimit <n>` | Limit the total size to be <= n bytes | **Deprecated!** 
Ignored in the new DistCp.

Reply via email to