That's a good point, in fact it didn't occured me that i could access it like that. But some questions came to my mind:
How do i put something into the fs? something like "bin/hadoop fs -put input input" will not work well since s3 is not the default fs, so i tried to do bin/hadoop fs -put input s3://ID:[EMAIL PROTECTED]/input (and some variations of it) but didn't worked, i always got an error complaining about not having provided the ID/secret for s3. To experiment a little i tried to edit conf/hadoop-site.xml (something that's just possible to do when experimenting because of lack of presistence of these changes, unless a new AMI is created) and added the fs.s3.awsAccessKeyId, fs.s3.awsSecretAccessKey properties and changed fs.default.name to an s3:// one. This worked to things like: mkdir input cp conf/*.xml input bin/hadoop fs -put input input bin/hadoop fs -ls input but then i faced another problem, when i tried to run bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+' that now should be able to run using S3 as a FileSystem, i got this error: 08/07/01 22:12:55 INFO mapred.FileInputFormat: Total input paths to process : 2 08/07/01 22:12:57 INFO mapred.JobClient: Running job: job_200807012133_0010 08/07/01 22:12:58 INFO mapred.JobClient: map 100% reduce 100% java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1062) (...) I tried several times and with the wordcount example but the error were always the same. What should be the problem here? And how may i access the FileSystem with "bin/hadoop fs ..." if the default filesystem isn't the S3? thank you very much :) slitz On Tue, Jul 1, 2008 at 4:43 PM, Chris K Wensel <[EMAIL PROTECTED]> wrote: > by editing the hadoop-site.xml, you set the default. but I don't recommend > changing the default on EC2. > > but you can specify the filesystem to use through the URL that references > your data (jobConf.addInputPath etc) for a particular job. in the case of > the S3 block filesystem, just use a s3:// url. > > ckw > > > On Jun 30, 2008, at 8:04 PM, slitz wrote: > > Hello, >> I've been trying to setup hadoop to use s3 as filesystem, i read in the >> wiki >> that it's possible to choose either S3 native FileSystem or S3 Block >> Filesystem. I would like to use S3 Block FileSystem to avoid the task of >> "manually" transferring data from S3 to HDFS every time i want to run a >> job. >> >> I'm still experimenting with EC2 contrib scripts and those seem to be >> excellent. >> What i can't understand is how may be possible to use S3 using a public >> hadoop AMI since from my understanding hadoop-site.xml gets written on >> each >> instance startup with the options on hadoop-init, and it seems that the >> public AMI (at least the 0.17.0 one) is not configured to use S3 at >> all(which makes sense because the bucket would need individual >> configuration >> anyway). >> >> So... to use S3 block FileSystem with EC2 i need to create a custom AMI >> with >> a modified hadoop-init script right? or am I completely confused? >> >> >> slitz >> > > -- > Chris K Wensel > [EMAIL PROTECTED] > http://chris.wensel.net/ > http://www.cascading.org/ > > > > > > >