[ 
https://issues.apache.org/jira/browse/HADOOP-13023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438921#comment-16438921
 ] 

Rohit Pegallapati commented on HADOOP-13023:
--------------------------------------------

This looks inline with the intended behavior of -update option

[https://hadoop.apache.org/docs/current/hadoop-distcp/DistCp.html]

{code}

{{-update}} is used to copy files from source that don’t exist at the target or 
differ from the target version. {{-overwrite}} overwrites target-files that 
exist at the target.

The Update and Overwrite options warrant special attention since their handling 
of source-paths varies from the defaults in a very subtle manner. Consider a 
copy from {{/source/first/}} and {{/source/second/}} to {{/target/}}, where the 
source paths have the following contents:
hdfs://nn1:8020/source/first/1
hdfs://nn1:8020/source/first/2
hdfs://nn1:8020/source/second/10
hdfs://nn1:8020/source/second/20
When DistCp is invoked without {{-update}} or {{-overwrite}}, the DistCp 
defaults would create directories {{first/}} and {{second/}}, under 
{{/target}}. Thus:
distcp hdfs://nn1:8020/source/first hdfs://nn1:8020/source/second 
hdfs://nn2:8020/target
would yield the following contents in {{/target}}:
hdfs://nn2:8020/target/first/1
hdfs://nn2:8020/target/first/2
hdfs://nn2:8020/target/second/10
hdfs://nn2:8020/target/second/20
When either {{-update}} or {{-overwrite}} is specified, the *contents* of the 
source-directories are copied to target, and not the source directories 
themselves. Thus:
distcp -update hdfs://nn1:8020/source/first hdfs://nn1:8020/source/second 
hdfs://nn2:8020/target
{code}

> Distcp with -update feature on first time raw data not working
> --------------------------------------------------------------
>
>                 Key: HADOOP-13023
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13023
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: tools/distcp
>    Affects Versions: 2.6.0
>            Reporter: Mavin Martin
>            Priority: Major
>
> When attempting to do a distcp with the -update feature toggled on encrypted 
> data, the distcp shows as successful.  Reading the encrypted file on the 
> target_path does not work since the keyName does not exist.  
> Please see my example to reproduce the issue.
> {code}
> [root@xxx bin]# hdfs crypto -listZones
> /tmp/a/ted                                DEF0000000000013
> [root@xxx bin]# hdfs dfs -ls -R /tmp
> drwxr-xr-x   - xxx xxx          0 2016-04-14 00:22 /tmp/a
> drwxr-xr-x   - xxx xxx          0 2016-04-14 00:00 /tmp/a/ted
> -rw-r--r--   3 xxx xxx         33 2016-04-14 00:00 /tmp/a/ted/test.txt
> [root@xxx bin]# hadoop distcp -update /.reserved/raw/tmp/a/ted 
> /.reserved/raw/tmp/a-with-update/ted
> [root@xxx bin]# hdfs crypto -listZones
> /tmp/a/ted                                DEF0000000000013
> [root@xxx bin]# hadoop distcp /.reserved/raw/tmp/a/ted 
> /.reserved/raw/tmp/a-no-update/ted
> [root@xxx bin]# hdfs crypto -listZones
> /tmp/a/ted                                DEF0000000000013
> /tmp/a-no-update/ted                      DEF0000000000013
> {code}
> The crypto zone for 'a-with-update' should have been created since this is a 
> new destination.  You can verify this by looking at 'a-no-update'.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to