Andrew Olson created HADOOP-16147: ------------------------------------- Summary: Allow CopyListing sequence file keys and values to be more easily customized Key: HADOOP-16147 URL: https://issues.apache.org/jira/browse/HADOOP-16147 Project: Hadoop Common Issue Type: Improvement Components: tools/distcp Reporter: Andrew Olson
We have encountered a scenario where, when using the Crunch library to run a distributed copy (CRUNCH-660, CRUNCH-675) at the conclusion of a job we need to dynamically rename target paths to the preferred destination output part file names, rather than retaining the original source path names. A custom CopyListing implementation appears to be the proper solution for this. However the place where the current SimpleCopyListing logic needs to be adjusted is in a private method (writeToFileListing), so a relatively large portion of the class would need to be cloned. To minimize the amount of code duplication required for such a custom implementation, we propose adding two new protected methods to the CopyListing class, that can be used to change the actual keys and/or values written to the copy listing sequence file: {noformat} protected Text getFileListingKey(Path sourcePathRoot, CopyListingFileStatus fileStatus); protected CopyListingFileStatus getFileListingValue(CopyListingFileStatus fileStatus); {noformat} The SimpleCopyListing class would then be modified to consume these methods as follows, {noformat} fileListWriter.append( getFileListingKey(sourcePathRoot, fileStatus), getFileListingValue(fileStatus)); {noformat} The default implementations would simply preserve the present behavior of the SimpleCopyListing class, and could reside in either CopyListing or SimpleCopyListing, whichever is preferable. {noformat} protected Text getFileListingKey(Path sourcePathRoot, CopyListingFileStatus fileStatus) { return new Text(DistCpUtils.getRelativePath(sourcePathRoot, fileStatus.getPath())); } protected CopyListingFileStatus getFileListingValue(CopyListingFileStatus fileStatus) { return fileStatus; } {noformat} Please let me know if this proposal seems to be on the right track. If so I can provide a patch. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org