Re: How do I reference S3 from an EC2 Hadoop cluster?

Tom White Tue, 24 Nov 2009 21:20:51 -0800

Mark,

If the data was transferred to S3 outside of Hadoop then you should
use the s3n filesystem scheme (see the explanation on
http://wiki.apache.org/hadoop/AmazonS3 for the differences between the
Hadoop S3 filesystems).


Also, some people have had problems embedding the secret key in the
URI, so you can set it in the configuration as follows:

<property>
  <name>fs.s3n.awsAccessKeyId</name>
  <value>ID</value>
</property>

<property>
  <name>fs.s3n.awsSecretAccessKey</name>
  <value>SECRET</value>
</property>

Then use a URI of the form s3n://<BUCKET>/path/to/logs

Cheers,
Tom

On Tue, Nov 24, 2009 at 5:47 PM, Mark Kerzner <[email protected]> wrote:
> Hi,
>
> I need to copy data from S3 to HDFS. This instruction
>
> bin/hadoop distcp s3://<ID>:<SECRET>@<BUCKET>/path/to/logs logs
>
> does not seem to work.
>
> Thank you.
>

Re: How do I reference S3 from an EC2 Hadoop cluster?

Reply via email to