Mohammad Kamrul Islam created MAPREDUCE-6713:
------------------------------------------------
Summary: Distcp doesn't provide the option to override the default
staging directory
Key: MAPREDUCE-6713
URL: https://issues.apache.org/jira/browse/MAPREDUCE-6713
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: distcp
Affects Versions: 2.5.1
Reporter: Mohammad Kamrul Islam
*Current state and shortcoming*
=======================
By default, distcp writes temporary files into
$TARGET_PATH/.distcp.tmp/$taskatttempttid. (See
RetriableFileCopyCommand#getTmpFile). There is no way a user can override this
staging/tmp directory. The problem is obvious in S3 with versioning. For
example, user wants to turn on S3 versioning only for his target directory but
not the staging/tmp directory. Current distcp also creates versioning for
staging directory which can contain a lot of temporary files. If user can
override this path by a non-versioned S3 path for staging, it will make things
cleaner.
*Proposed solution*
==============
Provide a new option(-stage) where user can optionally provide a path from
target FS. Distcp mapper tasks will write distcp temporary files into that
directory.
*Possible Confusions*
=================
There is another distcp option (-tmp) which can be assumed to serve the same
purpose. But this option works only with "-atomic" option which has a different
meaning of temporary files.
Another confusion could be the staging directory used by mapreduce framework.
The proposed temp directory is for distcp specific.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]