Zheng Shao created HADOOP-14137: ----------------------------------- Summary: Allow DistCp to take a file list within a src directory Key: HADOOP-14137 URL: https://issues.apache.org/jira/browse/HADOOP-14137 Project: Hadoop Common Issue Type: New Feature Components: tools/distcp Reporter: Zheng Shao
DistCp is very slow to start when the src directory has a huge number of subdirectories. In our case, we already have the directory listing (via "hdfs oiv -i fsimage" or via nightly "hdfs dfs -lr -r /" dumps), and we would like to use that instead of doing realtime listing on the NameNode. The "-f" option doesn't help in this case because it would try to put everything into a single flat target directory. We'd like to introduce a new option "-list <file>" for distcp. The <file> contains the result of listing the src directory. In order to achieve this, we plan to: 1. Add a new CopyListing class PregeneratedCopyListing similar to SimpleCopyListing which doesn't "-ls -r" into the directory, but takes the listing via "-list" 2. Add an option "-list <file>" which will automatically make distcp use the new PregeneratedCopyListing class. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org