Re: Using S3 Block FileSystem as HDFS replacement

slitz Tue, 01 Jul 2008 20:11:00 -0700

That's a good point, in fact it didn't occured me that i could access it
like that.
But some questions came to my mind:

How do i put something into the fs?
something like "bin/hadoop fs -put input input" will not work well since s3
is not the default fs, so i tried to do bin/hadoop fs -put input
s3://ID:[EMAIL PROTECTED]/input (and some variations of it) but didn't worked, i
always got an error complaining about not having provided the ID/secret for
s3.

To experiment a little i tried to edit conf/hadoop-site.xml (something
that's just possible to do when experimenting because of lack of presistence
of these changes, unless a new AMI is created) and added
the fs.s3.awsAccessKeyId, fs.s3.awsSecretAccessKey properties and changed
fs.default.name to an s3:// one.
This worked to things like:

mkdir input
cp conf/*.xml input
bin/hadoop fs -put input input
bin/hadoop fs -ls input

but then i faced another problem, when i tried to run
bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
that now should be able to run using S3 as a FileSystem, i got this error:

08/07/01 22:12:55 INFO mapred.FileInputFormat: Total input paths to process
: 2
08/07/01 22:12:57 INFO mapred.JobClient: Running job: job_200807012133_0010
08/07/01 22:12:58 INFO mapred.JobClient:  map 100% reduce 100%
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1062)
(...)

I tried several times and with the wordcount example but the error were
always the same.

What should be the problem here? And how may i access the FileSystem with
"bin/hadoop fs ..." if the default filesystem isn't the S3?

thank you very much :)

slitz

On Tue, Jul 1, 2008 at 4:43 PM, Chris K Wensel <[EMAIL PROTECTED]> wrote:

> by editing the hadoop-site.xml, you set the default. but I don't recommend
> changing the default on EC2.
>
> but you can specify the filesystem to use through the URL that references
> your data (jobConf.addInputPath etc) for a particular job. in the case of
> the S3 block filesystem, just use a s3:// url.
>
> ckw
>
>
> On Jun 30, 2008, at 8:04 PM, slitz wrote:
>
>  Hello,
>> I've been trying to setup hadoop to use s3 as filesystem, i read in the
>> wiki
>> that it's possible to choose either S3 native FileSystem or S3 Block
>> Filesystem. I would like to use S3 Block FileSystem to avoid the task of
>> "manually" transferring data from S3 to HDFS every time i want to run a
>> job.
>>
>> I'm still experimenting with EC2 contrib scripts and those seem to be
>> excellent.
>> What i can't understand is how may be possible to use S3 using a public
>> hadoop AMI since from my understanding hadoop-site.xml gets written on
>> each
>> instance startup with the options on hadoop-init, and it seems that the
>> public AMI (at least the 0.17.0 one) is not configured to use S3 at
>> all(which makes sense because the bucket would need individual
>> configuration
>> anyway).
>>
>> So... to use S3 block FileSystem with EC2 i need to create a custom AMI
>> with
>> a modified hadoop-init script right? or am I completely confused?
>>
>>
>> slitz
>>
>
> --
> Chris K Wensel
> [EMAIL PROTECTED]
> http://chris.wensel.net/
> http://www.cascading.org/
>
>
>
>
>
>
>

Re: Using S3 Block FileSystem as HDFS replacement

Reply via email to