Re: public Amazon EC2 hadoop images for larger instances (m1.large and m1.xlarge)
Yes - of course the different architecture means the existing image won't work! I've created https://issues.apache.org/jira/browse/HADOOP-2411 to track this issue. See this thread for how to take advantage of extra CPUs: http://www.mail-archive.com/hadoop-user@lucene.apache.org/msg02377.html. Tom On 12/12/2007, Thibaut Britz [EMAIL PROTECTED] wrote: Hi Tom, I got the following error Client.InvalidParameterValue: The requested instance type does not agree with the architecture specified in the AMI manifest. During meantime, I built a private AMI from the 64 bit version of fedora with hadoop version 14.4. What do I have to do to take advantages of the extra CPUS and the Ram (I only increased the mapred.child.java.opts)? My map/reduce code is comparable to wordcount (7413862 combine input pairs, 122475 reduce input pairs) and doesn't make a lot of calculations, but still takes over 30 seconds to complete, while the cpu is displayed to be idle about 99% of the time in top. (If I don't collect pairs in the map function, the job finished in 3 seconds (just to test how long it takes for my RecordReader to deliver the key/value pairs)). Also increasing the mappers dramatically reduces performances. (The above is measured with 2 mappers only on one x1.large instance). Any ideas on what causes the performance to be so low? Tom White wrote: Hi Thibaut, Do you know why the existing hadoop images don't work with larger instance types? What's the error message you're getting? It should be relatively easy to change the launch-hadoop-cluster script to specify the instance type. Also, there's some work to be done to configure Hadoop on larger images to take advantage of the extra CPUs and memory. BTW I just published a 0.15.1 AMI. Tom On 11/12/2007, Thibaut Britz [EMAIL PROTECTED] wrote: Hi, The current public images only work on the smaller instances. It would be very helpful (save me some time) if someone would be so kind create or publish their hadoop image. Thibaut -- View this message in context: http://www.nabble.com/public-Amazon-EC2-hadoop-images-for-larger-instances-%28m1.large-and-m1.xlarge%29-tp14276807p14276807.html Sent from the Hadoop Users mailing list archive at Nabble.com. -- Blog: http://problemsworthyofattack.blogspot.com/ -- View this message in context: http://www.nabble.com/public-Amazon-EC2-hadoop-images-for-larger-instances-%28m1.large-and-m1.xlarge%29-tp14276807p14296045.html Sent from the Hadoop Users mailing list archive at Nabble.com. -- Blog: http://problemsworthyofattack.blogspot.com/
public Amazon EC2 hadoop images for larger instances (m1.large and m1.xlarge)
Hi, The current public images only work on the smaller instances. It would be very helpful (save me some time) if someone would be so kind create or publish their hadoop image. Thibaut -- View this message in context: http://www.nabble.com/public-Amazon-EC2-hadoop-images-for-larger-instances-%28m1.large-and-m1.xlarge%29-tp14276807p14276807.html Sent from the Hadoop Users mailing list archive at Nabble.com.
Re: [Article] Running Hadoop MapReduce on Amazon EC2 and Amazon S3
Great stuff. If anybody is going to be at OSCON next week beside Doug Cuttings talk I would encourage you to check out the BoF Using Amazon Webservices EC2/S3/SQS for computing on large data sets Developers share their experience using EC2/S3/SQS. Come talk about performance, size of data set, problems w/ EC2/S3, favorite programming language/model (Hadoop - anyone?), tricks, tips, hacks, cool apps, tools, api usage, etc. http://conferences.oreillynet.com/cs/os2007/view/e_sess/14816 derek On 7/20/07, Matt Kangas [EMAIL PROTECTED] wrote: +1 to Stu's assessment! Very compelling article, Tom. Thanks also for the code that makes it possible. --matt On Jul 19, 2007, at 4:56 PM, Tom White wrote: All except the link... http://developer.amazonwebservices.com/connect/entry.jspa? externalID=873categoryID=112 On 19/07/07, Tom White [EMAIL PROTECTED] wrote: The title pretty much says it all, although I would say that it might be of interest even if you're not using Amazon Web Services. Tom -- Matt Kangas / [EMAIL PROTECTED]
[Article] Running Hadoop MapReduce on Amazon EC2 and Amazon S3
The title pretty much says it all, although I would say that it might be of interest even if you're not using Amazon Web Services. Tom
Using Hadoop on Amazon EC2
I just added a new wiki page describing how I was able to use Hadoop on Amazon's EC2 computing infrastructure. If others test this, please help improve it. http://wiki.apache.org/lucene-hadoop/AmazonEC2 Thanks, Doug
Amazon EC2
Has anyone tried running Hadoop on the Amazon Elastic Compute Cloud yet? http://www.amazon.com/gp/browse.html?node=201590011 One way to use Hadoop on this would be to: 1. Allocate a pool of machines. 2. Start Hadoop daemons. 3. Load the HDFS filesystem with input from Amazon S3. 4. Run a series of MapReduce computations. 5. Copy the final output from HDFS back to Amazon S3. 6. Deallocate the machines. Steps (3) and (4) could be eliminated if a Hadoop FileSystem were implemented on S3, so that input and output could be accessed directly from S3. (One might still use HDFS for intermediate data, as it should be faster.) The prices seem very reasonable. 100 nodes for 10 hours costs $100. Storing a terabyte on S3 costs $150/month. Transferring a terabyte of offsite data (e.g., fetching 100M pages) costs $200. So someone could use Nutch to keep a 100M page crawl, refreshed monthly, for around $500/month. Such a crawl could be shared with other organizations who would themselves pay, a la carte, for their computations over it. If anyone tries Hadoop on EC2, please tell us how it goes. Doug