Mohammad Kamrul Islam created MAPREDUCE-6713: ------------------------------------------------
Summary: Distcp doesn't provide the option to override the default staging directory Key: MAPREDUCE-6713 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6713 Project: Hadoop Map/Reduce Issue Type: Bug Components: distcp Affects Versions: 2.5.1 Reporter: Mohammad Kamrul Islam *Current state and shortcoming* ======================= By default, distcp writes temporary files into $TARGET_PATH/.distcp.tmp/$taskatttempttid. (See RetriableFileCopyCommand#getTmpFile). There is no way a user can override this staging/tmp directory. The problem is obvious in S3 with versioning. For example, user wants to turn on S3 versioning only for his target directory but not the staging/tmp directory. Current distcp also creates versioning for staging directory which can contain a lot of temporary files. If user can override this path by a non-versioned S3 path for staging, it will make things cleaner. *Proposed solution* ============== Provide a new option(-stage) where user can optionally provide a path from target FS. Distcp mapper tasks will write distcp temporary files into that directory. *Possible Confusions* ================= There is another distcp option (-tmp) which can be assumed to serve the same purpose. But this option works only with "-atomic" option which has a different meaning of temporary files. Another confusion could be the staging directory used by mapreduce framework. The proposed temp directory is for distcp specific. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org