a few things..
make sure all nodes are running in the same 'availability zone',
http://developer.amazonwebservices.com/connect/entry.jspa?externalID=1347
and that you are using the new xen kernels.
http://developer.amazonwebservices.com/connect/entry.jspa?externalID=1353&categoryID=101
http://developer.amazonwebservices.com/connect/entry.jspa?externalID=1354&categoryID=101
also, make sure each node is addressing its peers via the ec2 private
addresses, not the public ones.
there is a patch in jira for the ec2/contrib scripts that address
these issues.
https://issues.apache.org/jira/browse/HADOOP-2410
if you use those scripts, you will be able to see a ganglia display
showing utilization on the machines. 8/7 map/reducers sounds like alot.
ymmv
On Apr 9, 2008, at 7:07 PM, Nate Carlson wrote:
Hey all,
We've got a job that we're running in both a development
environment, and out on EC2. I've been rather displeased with the
performance on EC2, and was curious if the results that we've been
seeing are similar to other people's, or if I've got something
misconfigured. ;) In both environments, the load on the master
node is around 1-1.5, and the load on the slave nodes in around
8-10. I have also tried cranking up the JVM memory on the EC2 nodes
(since we got RAM to blow), with very little performance difference.
Basically, the job takes about 3.5 hours on development, but takes
15 hours on EC2. With the portion that takes all the time, it is not
dependent on any external hosts - just the MySQL server on the
master node. I benchmarked the VCPU's between our dev and EC2, and
they are about equivilent.. I would expect EC2 to take 1.5x as long,
since there is one less CPU per slave, but it's taking much longer
than that.
Appreciate any tips!
Similarities between the environments:
- 1 master node, 2 slave nodes
- 1 mapper and reducer on the master, 8 mappers and 7 reducers on the
slaves
- Hadoop 0.16.2
- Local HDFS storage (we were using S3 on amazon before, and I
switched to
local storage)
- MySQL database running on the master node
- Xen VM's in both environments (our own Xen for dev, Amazon's for
EC2)
- Debian Etch 64-bit OS; 64-bit JVM
Development master node configuration:
- 4x VCPU's (Xeon E5335 2ghz)
- 3gb memory
- 4gb swap
Development slave nodes configuration:
- 3x VCPU's (Xeon E5335 2ghz)
- 2gb memory
- 4gb swap
EC2 Configuration ("Large" instance type):
- 2x VCPU's (Opteron 2ghz)
- 8gb memory
- 4gb swap
- All nodes running in the same availabity zone
------------------------------------------------------------------------
| nate carlson | [EMAIL PROTECTED] | http://
www.natecarlson.com |
| depriving some poor village of its idiot since
1981 |
------------------------------------------------------------------------
Chris K Wensel
[EMAIL PROTECTED]
http://chris.wensel.net/
http://www.cascading.org/