[ 
https://issues.apache.org/jira/browse/HADOOP-14137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15897439#comment-15897439
 ] 

Erik Krogen commented on HADOOP-14137:
--------------------------------------

Hey [~zshao], unfortunately I don't really have the bandwidth at this time - 
for our project we ended up going in little bit different direction so I don't 
really have code I can directly contribute either. I'm happy to stay involved 
in discussion and suggestions and might be able to contribute code at some 
point.

FYI on object stores, I think Steve is probably referring to S3. You can see 
the {{s3a}} filesystem and {{s3guard}} for more info.

> Faster distcp by taking file list from fsimage or -lsr result
> -------------------------------------------------------------
>
>                 Key: HADOOP-14137
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14137
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: tools/distcp
>            Reporter: Zheng Shao
>         Attachments: HADOOP-14137.branch26.1.patch, 
> HADOOP-14137.branch26.2.patch
>
>
> DistCp is very slow to start when the src directory has a huge number of 
> subdirectories.  In our case, we already have the directory listing (via 
> "hdfs oiv -i fsimage" or via nightly "hdfs dfs -lr -r /" dumps), and we would 
> like to use that instead of doing realtime listing on the NameNode.
> The "-f" option doesn't help in this case because it would try to put 
> everything into a single flat target directory.
> We'd like to introduce a new option "-list <file>" for distcp.  The <file> 
> contains the result of listing the src directory.
> In order to achieve this, we plan to:
> 1. Add a new CopyListing class PregeneratedCopyListing similar to 
> SimpleCopyListing which doesn't "-ls -r" into the directory, but takes the 
> listing via "-list"
> 2. Add an option "-list <file>" which will automatically make distcp use the 
> new PregeneratedCopyListing class.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to