Yes, Tom, I saw all these problems. I think that I should stop trying to imitate EMR - that's where the storing data on S3 appeared, and transfer data directly to the Hadoop cluster. Then I will be using all as intended.
Is there a way to scp directly to the HDFS, or do I need to scp to local storage on some machine, and then - to HDFS? Also, is there a way to make the master a bigger instance than that of the slaves? Thank you, Mark On Tue, Nov 24, 2009 at 11:20 PM, Tom White <[email protected]> wrote: > Mark, > > If the data was transferred to S3 outside of Hadoop then you should > use the s3n filesystem scheme (see the explanation on > http://wiki.apache.org/hadoop/AmazonS3 for the differences between the > Hadoop S3 filesystems). > > Also, some people have had problems embedding the secret key in the > URI, so you can set it in the configuration as follows: > > <property> > <name>fs.s3n.awsAccessKeyId</name> > <value>ID</value> > </property> > > <property> > <name>fs.s3n.awsSecretAccessKey</name> > <value>SECRET</value> > </property> > > Then use a URI of the form s3n://<BUCKET>/path/to/logs > > Cheers, > Tom > > On Tue, Nov 24, 2009 at 5:47 PM, Mark Kerzner <[email protected]> > wrote: > > Hi, > > > > I need to copy data from S3 to HDFS. This instruction > > > > bin/hadoop distcp s3://<ID>:<SECRET>@<BUCKET>/path/to/logs logs > > > > does not seem to work. > > > > Thank you. > > >
