Re: public Amazon EC2 hadoop images for larger instances (m1.large and m1.xlarge)

2007-12-12 Thread Tom White
Yes - of course the different architecture means the existing image
won't work! I've created
https://issues.apache.org/jira/browse/HADOOP-2411 to track this issue.
See this thread for how to take advantage of extra CPUs:
http://www.mail-archive.com/hadoop-user@lucene.apache.org/msg02377.html.

Tom

On 12/12/2007, Thibaut Britz [EMAIL PROTECTED] wrote:

 Hi Tom,

 I got the following error Client.InvalidParameterValue: The requested
 instance type does not agree with the architecture specified in the AMI
 manifest.
 During meantime, I built a private AMI from the 64 bit version of fedora
 with hadoop version 14.4.

 What do I have to do to take advantages of the extra CPUS and the Ram (I
 only increased the mapred.child.java.opts)?


 My map/reduce code is comparable to wordcount (7413862 combine input pairs,
 122475 reduce input pairs) and doesn't make a lot of calculations, but still
 takes over 30 seconds to complete, while the cpu is displayed to be idle
 about 99% of the time in top. (If I don't collect pairs in the map function,
 the job finished in 3 seconds (just to test how long it takes for my
 RecordReader to deliver the key/value pairs)). Also increasing the mappers
 dramatically reduces performances. (The above is measured with 2 mappers
 only on one x1.large instance). Any ideas on what causes the performance to
 be so low?




 Tom White wrote:
 
  Hi Thibaut,
 
  Do you know why the existing hadoop images don't work with larger
  instance types? What's the error message you're getting?
 
  It should be relatively easy to change the launch-hadoop-cluster
  script to specify the instance type. Also, there's some work to be
  done to configure Hadoop on larger images to take advantage of the
  extra CPUs and memory.
 
  BTW I just published a 0.15.1 AMI.
 
  Tom
 
  On 11/12/2007, Thibaut Britz [EMAIL PROTECTED] wrote:
 
  Hi,
 
  The current public images only work on the smaller instances.
  It would be very helpful (save me some time) if someone would be so kind
  create or publish their hadoop image.
 
  Thibaut
 
  --
  View this message in context:
  http://www.nabble.com/public-Amazon-EC2-hadoop-images-for-larger-instances-%28m1.large-and-m1.xlarge%29-tp14276807p14276807.html
  Sent from the Hadoop Users mailing list archive at Nabble.com.
 
 
 
 
  --
  Blog: http://problemsworthyofattack.blogspot.com/
 
 

 --
 View this message in context: 
 http://www.nabble.com/public-Amazon-EC2-hadoop-images-for-larger-instances-%28m1.large-and-m1.xlarge%29-tp14276807p14296045.html
 Sent from the Hadoop Users mailing list archive at Nabble.com.




-- 
Blog: http://problemsworthyofattack.blogspot.com/


public Amazon EC2 hadoop images for larger instances (m1.large and m1.xlarge)

2007-12-11 Thread Thibaut Britz

Hi,

The current public images only work on the smaller instances. 
It would be very helpful (save me some time) if someone would be so kind
create or publish their hadoop image.

Thibaut

-- 
View this message in context: 
http://www.nabble.com/public-Amazon-EC2-hadoop-images-for-larger-instances-%28m1.large-and-m1.xlarge%29-tp14276807p14276807.html
Sent from the Hadoop Users mailing list archive at Nabble.com.



Re: [Article] Running Hadoop MapReduce on Amazon EC2 and Amazon S3

2007-07-20 Thread Derek Gottfrid

Great stuff. If anybody is going to be at OSCON next week beside Doug
Cuttings talk I would encourage you to check out the BoF

Using Amazon Webservices EC2/S3/SQS for computing on large data sets

Developers share their experience using EC2/S3/SQS. Come talk about
performance, size of data set, problems w/ EC2/S3, favorite
programming language/model (Hadoop - anyone?), tricks, tips, hacks,
cool apps, tools, api usage, etc.

http://conferences.oreillynet.com/cs/os2007/view/e_sess/14816

derek

On 7/20/07, Matt Kangas [EMAIL PROTECTED] wrote:

+1 to Stu's assessment! Very compelling article, Tom. Thanks also for
the code that makes it possible.

--matt

On Jul 19, 2007, at 4:56 PM, Tom White wrote:

 All except the link...

 http://developer.amazonwebservices.com/connect/entry.jspa?
 externalID=873categoryID=112

 On 19/07/07, Tom White [EMAIL PROTECTED] wrote:
 The title pretty much says it all, although I would say that it might
 be of interest even if you're not using Amazon Web Services.

 Tom


--
Matt Kangas / [EMAIL PROTECTED]





[Article] Running Hadoop MapReduce on Amazon EC2 and Amazon S3

2007-07-19 Thread Tom White

The title pretty much says it all, although I would say that it might
be of interest even if you're not using Amazon Web Services.

Tom


Using Hadoop on Amazon EC2

2006-10-27 Thread Doug Cutting
I just added a new wiki page describing how I was able to use Hadoop on 
Amazon's EC2 computing infrastructure.  If others test this, please help 
improve it.


http://wiki.apache.org/lucene-hadoop/AmazonEC2

Thanks,

Doug


Amazon EC2

2006-08-25 Thread Doug Cutting

Has anyone tried running Hadoop on the Amazon Elastic Compute Cloud yet?

http://www.amazon.com/gp/browse.html?node=201590011

One way to use Hadoop on this would be to:

1. Allocate a pool of machines.
2. Start Hadoop daemons.
3. Load the HDFS filesystem with input from Amazon S3.
4. Run a series of MapReduce computations.
5. Copy the final output from HDFS back to Amazon S3.
6. Deallocate the machines.

Steps (3) and (4) could be eliminated if a Hadoop FileSystem were
implemented on S3, so that input and output could be accessed directly
from S3.  (One might still use HDFS for intermediate data, as it should
be faster.)

The prices seem very reasonable.  100 nodes for 10 hours costs $100.
Storing a terabyte on S3 costs $150/month.  Transferring a terabyte of
offsite data (e.g., fetching 100M pages) costs $200.  So someone could
use Nutch to keep a 100M page crawl, refreshed monthly, for around
$500/month.  Such a crawl could be shared with other organizations who
would themselves pay, a la carte, for their computations over it.

If anyone tries Hadoop on EC2, please tell us how it goes.

Doug