On Tue, Nov 24, 2009 at 9:27 PM, Mark Kerzner <[email protected]> wrote: > Yes, Tom, I saw all these problems. I think that I should stop trying to > imitate EMR - that's where the storing data on S3 appeared, and transfer > data directly to the Hadoop cluster. Then I will be using all as intended. > > Is there a way to scp directly to the HDFS, or do I need to scp to local > storage on some machine, and then - to HDFS?
distcp is the appropriate tool for this. There is some guidance on http://wiki.apache.org/hadoop/AmazonS3. > Also, is there a way to make > the master a bigger instance than that of the slaves? No, this is not supported, but I can see it would be useful, particularly for larger clusters. Please consider opening a JIRA for it. Cheers, Tom > > Thank you, > Mark > > On Tue, Nov 24, 2009 at 11:20 PM, Tom White <[email protected]> wrote: > >> Mark, >> >> If the data was transferred to S3 outside of Hadoop then you should >> use the s3n filesystem scheme (see the explanation on >> http://wiki.apache.org/hadoop/AmazonS3 for the differences between the >> Hadoop S3 filesystems). >> >> Also, some people have had problems embedding the secret key in the >> URI, so you can set it in the configuration as follows: >> >> <property> >> <name>fs.s3n.awsAccessKeyId</name> >> <value>ID</value> >> </property> >> >> <property> >> <name>fs.s3n.awsSecretAccessKey</name> >> <value>SECRET</value> >> </property> >> >> Then use a URI of the form s3n://<BUCKET>/path/to/logs >> >> Cheers, >> Tom >> >> On Tue, Nov 24, 2009 at 5:47 PM, Mark Kerzner <[email protected]> >> wrote: >> > Hi, >> > >> > I need to copy data from S3 to HDFS. This instruction >> > >> > bin/hadoop distcp s3://<ID>:<SECRET>@<BUCKET>/path/to/logs logs >> > >> > does not seem to work. >> > >> > Thank you. >> > >> >
