[
https://issues.apache.org/jira/browse/HADOOP-14137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892077#comment-15892077
]
Steve Loughran commented on HADOOP-14137:
-----------------------------------------
I'd actually like the listing to be done via listFiles(path, recursive=true).
Why: on object stores we can do the recursive listing as a flat operation,
rather than a treewalk. Wouldn't help here, as HDFS does do the walk. Moving
onto an image would reduce NN load, so make for happy people all round.
Not sure about the option name, something to make clear its a file, e.g
{{--listingFile <path>}}? Supporting hdfs as well as file:// paths could be
useful in future: you could store the listing in HDFS, ready for next time
> Faster distcp by taking file list from fsimage or -lsr result
> -------------------------------------------------------------
>
> Key: HADOOP-14137
> URL: https://issues.apache.org/jira/browse/HADOOP-14137
> Project: Hadoop Common
> Issue Type: New Feature
> Components: tools/distcp
> Reporter: Zheng Shao
>
> DistCp is very slow to start when the src directory has a huge number of
> subdirectories. In our case, we already have the directory listing (via
> "hdfs oiv -i fsimage" or via nightly "hdfs dfs -lr -r /" dumps), and we would
> like to use that instead of doing realtime listing on the NameNode.
> The "-f" option doesn't help in this case because it would try to put
> everything into a single flat target directory.
> We'd like to introduce a new option "-list <file>" for distcp. The <file>
> contains the result of listing the src directory.
> In order to achieve this, we plan to:
> 1. Add a new CopyListing class PregeneratedCopyListing similar to
> SimpleCopyListing which doesn't "-ls -r" into the directory, but takes the
> listing via "-list"
> 2. Add an option "-list <file>" which will automatically make distcp use the
> new PregeneratedCopyListing class.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]