Re: Using S3 Block FileSystem as HDFS replacement

Chris K Wensel Tue, 01 Jul 2008 08:44:32 -0700

by editing the hadoop-site.xml, you set the default. but I don'trecommend changing the default on EC2.

but you can specify the filesystem to use through the URL thatreferences your data (jobConf.addInputPath etc) for a particular job.in the case of the S3 block filesystem, just use a s3:// url.


ckw

On Jun 30, 2008, at 8:04 PM, slitz wrote:

Hello,
I've been trying to setup hadoop to use s3 as filesystem, i read inthe wiki
that it's possible to choose either S3 native FileSystem or S3 Block
Filesystem. I would like to use S3 Block FileSystem to avoid thetask of"manually" transferring data from S3 to HDFS every time i want torun a job.
I'm still experimenting with EC2 contrib scripts and those seem to be
excellent.
What i can't understand is how may be possible to use S3 using apublichadoop AMI since from my understanding hadoop-site.xml gets writtenon eachinstance startup with the options on hadoop-init, and it seems thatthe
public AMI (at least the 0.17.0 one) is not configured to use S3 at
all(which makes sense because the bucket would need individualconfiguration
anyway).
So... to use S3 block FileSystem with EC2 i need to create a customAMI with
a modified hadoop-init script right? or am I completely confused?


slitz


--
Chris K Wensel
[EMAIL PROTECTED]
http://chris.wensel.net/
http://www.cascading.org/

Re: Using S3 Block FileSystem as HDFS replacement

Reply via email to