[
https://issues.apache.org/jira/browse/MAPREDUCE-6734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383326#comment-15383326
]
Frederick Tucker commented on MAPREDUCE-6734:
---------------------------------------------
Better description of the feature:
Sometimes it is desirable, particularly when using globs with source files, to
preserve the file structure of the source at the destination. The options
`-preservepath` and `-sourceprefixmask` allow distcp to maintain the file
structure at the destination.
If `-preservepath` is used, the absolute path of the source file will be
appended to the specified destination directory.
For example
distcp -preservepath hdfs://nn1:9820/source/some/file
hdfs://nn2:9820/target
would yield the following contents in `/target`:
hdfs://nn2:9820/target/source/some/file
Sometimes the entire absolute path of the source file is not needed. The
option `-sourceprefixmask` will remove the start of the absolute path of the
source file.
For example
distcp -preservepath -sourceprefixmask /source
hdfs://nn1:9820/source/some/file hdfs://nn2:9820/target
would yield the following contents in `/target`:
hdfs://nn2:9820/target/some/file
It also works with source file globbing.
For example
distcp -preservepath -sourceprefixmask /source
hdfs://nn1:9820/source/*/file hdfs://nn2:9820/target
With sources:
hdfs://nn1:9820/source/first/file
hdfs://nn1:9820/source/second/file
hdfs://nn1:9820/source/third/file
hdfs://nn1:9820/source/fourth/file
would yield the following contents in `/target`:
hdfs://nn2:9820/target/first/file
hdfs://nn2:9820/target/second/file
hdfs://nn2:9820/target/third/file
hdfs://nn2:9820/target/fourth/file
Other Notes:
* `sourceprefixmask` does not support globbing
* Only one value can be passed to `sourceprefixmask`
* Only file systems that use the forward slash `/` separator are supported
* If the value passed to `sourceprefixmask` does not match the start of the
source file's absolute path, the entire absolute path will be preserved at the
target.
> Add option to distcp to preserve file path structure of source files at the
> destination
> ---------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-6734
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6734
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: distcp
> Affects Versions: 3.0.0-alpha2
> Environment: Software platform
> Reporter: Frederick Tucker
> Priority: Critical
> Labels: distcp, newbie, patch
> Fix For: 3.0.0-alpha2
>
> Attachments: MAPREDUCE-6734.3.0.0-alpha2.patch,
> MAPREDUCE-6734.3.0.0-alpha2.patch
>
> Original Estimate: 24h
> Remaining Estimate: 24h
>
> When copying files using distcp with globbed source files, all the matched
> files in the glob are copied in a single flat directory. This causes
> problems when the file structure at the source is important. It also is an
> issue when there are two files matched in the glob with the same name because
> it causes a duplicate file error at the target. I'd like to have an option
> to preserve the file structure of the source files when globbing inputs.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]