Re: How do I reference S3 from an EC2 Hadoop cluster?

Mark Kerzner Tue, 24 Nov 2009 21:27:41 -0800

Yes, Tom, I saw all these problems. I think that I should stop trying to
imitate EMR - that's where the storing data on S3 appeared, and transfer
data directly to the Hadoop cluster. Then I will be using all as intended.


Is there a way to scp directly to the HDFS, or do I need to scp to local
storage on some machine, and then - to HDFS? Also, is there a way to make
the master a bigger instance than that of the slaves?

Thank you,
Mark

On Tue, Nov 24, 2009 at 11:20 PM, Tom White <[email protected]> wrote:

> Mark,
>
> If the data was transferred to S3 outside of Hadoop then you should
> use the s3n filesystem scheme (see the explanation on
> http://wiki.apache.org/hadoop/AmazonS3 for the differences between the
> Hadoop S3 filesystems).
>
> Also, some people have had problems embedding the secret key in the
> URI, so you can set it in the configuration as follows:
>
> <property>
>  <name>fs.s3n.awsAccessKeyId</name>
>  <value>ID</value>
> </property>
>
> <property>
>  <name>fs.s3n.awsSecretAccessKey</name>
>  <value>SECRET</value>
> </property>
>
> Then use a URI of the form s3n://<BUCKET>/path/to/logs
>
> Cheers,
> Tom
>
> On Tue, Nov 24, 2009 at 5:47 PM, Mark Kerzner <[email protected]>
> wrote:
> > Hi,
> >
> > I need to copy data from S3 to HDFS. This instruction
> >
> > bin/hadoop distcp s3://<ID>:<SECRET>@<BUCKET>/path/to/logs logs
> >
> > does not seem to work.
> >
> > Thank you.
> >
>

Re: How do I reference S3 from an EC2 Hadoop cluster?

Reply via email to