Hi,
I don't have any real benchmarks or testing to speak of specifically
for the performance benefits of a larger instance size. However, we
have played around a little and for our work (a form of document
clustering) the benefits of a larger instance were far outweighed by
having more of the less powerful instances. During the early days of
our experiments with Hadoop and EC2, this was by far and away the most
surprising thing (although in retrospect I guess it's no so strange!)
Not sure it answers your question, but food for thought hopefully.
Thanks,
Paul
On 29 Sep 2009, at 18:33, Brian Bockelman wrote:
Hey Kevin,
From seeing presentations from the HEP field (totally unrelated to
Hadoop), I've seen folks claim the large instance is more than 4x
better than the small, and less than 2x slower than extra-large.
I.e., it provided that application the best bang for its buck.
In other words, you're not completely crazy for believing this, and
other people have reported seeing non-linear differences between the
difference instance types. I suspect the "best" will depend highly
on what your app is doing.
Brian
On Sep 29, 2009, at 12:19 PM, Kevin Peterson wrote:
Has anyone done any extensive testing of what instance types on
Amazon EC2
give you the most bang for the buck?
Given the normal Hadoop recommendations of beefy machines, I would
expect
the best performance from the extra-large, but our testing showed
otherwise.
We did some rough testing while we were just getting started with
like a 10
node cluster, and we found that the extra large instance doesn't
come close
to twice the actual performance of the large instance (pricing at
$0.80 and
$0.40). My rationalization is that some of the resources are
shared, and the
extra-large instance corresponds to the actual hardware, while the
large
instance sometimes gets to take advantage of IO and network
bandwidth beyond
50% when the other tenant isn't doing much.
I'm revisiting our config because we're deploying HBase soon, and
I'm not
sure whether I would be better off going to the extra-large
instances so
that I can co-locate the tasktrackers and the region servers on the
same
nodes, or if I should stick with large instances and put hbase on
separate
servers. Mostly I'm wondering if my results were a fluke.