[ 
https://issues.apache.org/jira/browse/HADOOP-862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463198
 ] 

[EMAIL PROTECTED] commented on HADOOP-862:
------------------------------------------

Updated patch.

+ Renamed DFSCopyFilesMapper as FSCopyFilesMapper
+ If no scheme, use 'default' (the value of 'fs.default.name' in 
hadoop-site.xml).

I ran more extensive tests going from hdfs to s3 and back again and copying 
from http into s3 and hdfs (distcp is a nice tool).  For example, here is 
output from a copy of a small nutch segment from hdfs to s3 (in the below hdfs 
was set as the fs.default.name filesystem):

[EMAIL PROTECTED]:~/checkouts/hadoop$ ./bin/hadoop fs -lsr outputs/segments
/user/stack/outputs/segments/20070108213341-test        <dir>
/user/stack/outputs/segments/20070108213341-test/crawl_fetch    <dir>
/user/stack/outputs/segments/20070108213341-test/crawl_fetch/part-00000 <dir>
/user/stack/outputs/segments/20070108213341-test/crawl_fetch/part-00000/data    
<r 1>   1187
/user/stack/outputs/segments/20070108213341-test/crawl_fetch/part-00000/index   
<r 1>   234
/user/stack/outputs/segments/20070108213341-test/crawl_parse    <dir>
/user/stack/outputs/segments/20070108213341-test/crawl_parse/part-00000 <r 1>   
9010
/user/stack/outputs/segments/20070108213341-test/parse_data     <dir>
/user/stack/outputs/segments/20070108213341-test/parse_data/part-00000  <dir>
/user/stack/outputs/segments/20070108213341-test/parse_data/part-00000/data     
<r 1>   4630
/user/stack/outputs/segments/20070108213341-test/parse_data/part-00000/index    
<r 1>   234
/user/stack/outputs/segments/20070108213341-test/parse_text     <dir>
/user/stack/outputs/segments/20070108213341-test/parse_text/part-00000  <dir>
/user/stack/outputs/segments/20070108213341-test/parse_text/part-00000/data     
<r 1>   6180
/user/stack/outputs/segments/20070108213341-test/parse_text/part-00000/index    
<r 1>   234

Here's copy to an s3 directory named segments-bkup:

% ./bin/hadoop distcp /user/stack/outputs/segments s3://KEY:[EMAIL 
PROTECTED]/segments-bkup

Here's listing of s3 content:

[EMAIL PROTECTED]:~/checkouts/hadoop$ ./bin/hadoop fs -fs s3://KEY:[EMAIL 
PROTECTED]/segments-bkup -lsr /segments-bkup/
/segments-bkup/20070108213341-test      <dir>
/segments-bkup/20070108213341-test/crawl_fetch  <dir>
/segments-bkup/20070108213341-test/crawl_fetch/part-00000       <dir>
/segments-bkup/20070108213341-test/crawl_fetch/part-00000/data  <r 1>   1187
/segments-bkup/20070108213341-test/crawl_fetch/part-00000/index <r 1>   234
/segments-bkup/20070108213341-test/crawl_parse  <dir>
/segments-bkup/20070108213341-test/crawl_parse/part-00000       <r 1>   9010
/segments-bkup/20070108213341-test/parse_data   <dir>
/segments-bkup/20070108213341-test/parse_data/part-00000        <dir>
/segments-bkup/20070108213341-test/parse_data/part-00000/data   <r 1>   4630
/segments-bkup/20070108213341-test/parse_data/part-00000/index  <r 1>   234
/segments-bkup/20070108213341-test/parse_text   <dir>
/segments-bkup/20070108213341-test/parse_text/part-00000        <dir>
/segments-bkup/20070108213341-test/parse_text/part-00000/data   <r 1>   6180
/segments-bkup/20070108213341-test/parse_text/part-00000/index  <r 1>   234

> Add handling of s3 to CopyFile tool
> -----------------------------------
>
>                 Key: HADOOP-862
>                 URL: https://issues.apache.org/jira/browse/HADOOP-862
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: util
>    Affects Versions: 0.10.0
>            Reporter: [EMAIL PROTECTED]
>            Priority: Minor
>         Attachments: copyfiles-s3-2.diff, copyfiles-s3.diff
>
>
> CopyFile is a useful tool for doing bulk copies.  It doesn't have handling 
> for the recently added s3 filesystem.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to