Re: distcp/ls fails on Hadoop-0.17.0 on ec2.

Einar Vollset Mon, 02 Jun 2008 00:27:54 -0700

Hi Tom.

Ah... From reading (your?) article:


http://developer.amazonwebservices.com/connect/entry.jspa?externalID=873&categoryID=112

I got confused; it seems to suggest that distcp is used to move
ordinary S3 objects onto HDFS..

Thanks for the clarification.

Cheers,

Einar


On Sat, May 31, 2008 at 11:58 PM, Tom White <[EMAIL PROTECTED]> wrote:
> Hi Einar,
>
> How did you put the data onto S3, using Hadoop's S3 FileSystem or
> using other S3 tools? If it's the latter then it won't work as the s3
> scheme is for Hadoop's block-based S3 storage. Native S3 support is
> coming - see https://issues.apache.org/jira/browse/HADOOP-930, but
> it's not integrated yet.
>
> Tom
>
> On Thu, May 29, 2008 at 10:15 PM, Einar Vollset
> <[EMAIL PROTECTED]> wrote:
>> Hi,
>>
>> I'm using the current Hadoop ec2 image (ami-ee53b687), and am having
>> some trouble getting hadoop
>> to access S3. Specifically, I'm trying to copy files from my bucket,
>> into HDFS on the running cluster, so
>> (on the master on the booted cluster) I do:
>>
>> hadoop-0.17.0 einar$ bin/hadoop distcp
>> s3://ID:[EMAIL PROTECTED]/ input
>> 08/05/29 14:10:44 INFO util.CopyFiles: srcPaths=[
>> s3://ID:[EMAIL PROTECTED]/]
>> 08/05/29 14:10:44 INFO util.CopyFiles: destPath=input
>> 08/05/29 14:10:46 WARN fs.FileSystem: "localhost:9000" is a deprecated
>> filesystem name. Use "hdfs://localhost:9000/" instead.
>> With failures, global counters are inaccurate; consider running with -i
>> Copy failed: org.apache.hadoop.mapred.InvalidInputException: Input
>> source  s3://ID:[EMAIL PROTECTED]/ does not
>> exist.
>>        at org.apache.hadoop.util.CopyFiles.checkSrcPath(CopyFiles.java:578)
>>        at org.apache.hadoop.util.CopyFiles.copy(CopyFiles.java:594)
>>        at org.apache.hadoop.util.CopyFiles.run(CopyFiles.java:743)
>>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>        at org.apache.hadoop.util.CopyFiles.main(CopyFiles.java:763)
>>
>> ..which clearly doesn't work. The ID:SECRET are right - as if I change
>> them I get :
>>
>> org.jets3t.service.S3ServiceException: S3 HEAD request failed.
>> ResponseCode=403, ResponseMessage=Forbidden
>> ..etc
>>
>> I suspect it might be a generic problem, as if I do:
>>
>> bin/hadoop fs -ls  s3://ID:[EMAIL PROTECTED]/
>>
>> I get:
>> ls: Cannot access s3://ID:[EMAIL PROTECTED]/ :
>> No such file or directory.
>>
>>
>> ..even though the bucket is there and has a lot of data in it.
>>
>>
>> Any thoughts?
>>
>> Cheers,
>>
>> Einar
>>
>



-- 
Einar Vollset
Chief Scientist
Something Simpler Systems

690 - 220 Cambie St
Vancouver, BC V6B 2M9
Canada

ph: +1-778-987-4256
http://somethingsimpler.com

Re: distcp/ls fails on Hadoop-0.17.0 on ec2.

Reply via email to