We used the small instances and the difference was around 5x-8x, depending
on what we tried to run. I'm really surprised that large instances have
such bad performance characteristics.


D.
--------------
Attributor-publish with confidence
We are still hiring developers....





-----Original Message-----
From: Nate Carlson [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, April 09, 2008 7:07 PM
To: core-user@hadoop.apache.org
Subject: Hadoop performance on EC2?

Hey all,

We've got a job that we're running in both a development environment, and 
out on EC2.  I've been rather displeased with the performance on EC2, and 
was curious if the results that we've been seeing are similar to other 
people's, or if I've got something misconfigured.  ;)  In both 
environments, the load on the master node is around 1-1.5, and the load on

the slave nodes in around 8-10. I have also tried cranking up the JVM 
memory on the EC2 nodes (since we got RAM to blow), with very little 
performance difference.

Basically, the job takes about 3.5 hours on development, but takes 15 
hours on EC2. With the portion that takes all the time, it is not 
dependent on any external hosts - just the MySQL server on the master 
node.  I benchmarked the VCPU's between our dev and EC2, and they are 
about equivilent.. I would expect EC2 to take 1.5x as long, since there is

one less CPU per slave, but it's taking much longer than that.

Appreciate any tips!

Similarities between the environments:
- 1 master node, 2 slave nodes
- 1 mapper and reducer on the master, 8 mappers and 7 reducers on the
   slaves
- Hadoop 0.16.2
- Local HDFS storage (we were using S3 on amazon before, and I switched to
   local storage)
- MySQL database running on the master node
- Xen VM's in both environments (our own Xen for dev, Amazon's for EC2)
- Debian Etch 64-bit OS; 64-bit JVM

Development master node configuration:
- 4x VCPU's (Xeon E5335 2ghz)
- 3gb memory
- 4gb swap

Development slave nodes configuration:
- 3x VCPU's (Xeon E5335 2ghz)
- 2gb memory
- 4gb swap

EC2 Configuration ("Large" instance type):
- 2x VCPU's (Opteron 2ghz)
- 8gb memory
- 4gb swap
- All nodes running in the same availabity zone

------------------------------------------------------------------------
| nate carlson | [EMAIL PROTECTED] | http://www.natecarlson.com |
|       depriving some poor village of its idiot since 1981            |
------------------------------------------------------------------------

Reply via email to