On 06.09.2007, at 09:56, Pietu Pohjalainen wrote:
Jeroen Verhagen wrote:
On 9/5/07, Steve Schlosser <[EMAIL PROTECTED]> wrote:
question, but I was wondering if anyone has a reasonable qualitative
answer that I can pass on when people ask.
Is this question really relevant since Hadoop is designed to run on a
cluster of commodity hardware Google-style? If there were any
difference I'm sure it would be solved by adding 1 machine to the
cluster.
Isn't it about whether to add 30% or 50% more machines? Which is
starting to get significant when you think whether to have 1000 or
1500 machines.
A plain java vs <some language> discussion is way to simple. I've
been working on a java project that way (!!) out-performed a similar C
++ project. The design and a smart implementation will make more
difference that just the plain language. Long running vs short
running ..all what has already been said. At least that's my
experience. That being said, for hadoop the one-child-jvm-per-job is
what has quite a bit of an overhead. If you are not scared that your
jobs will tear down your tasktrackers - we have an in-jvm execution
patch. (not submitted yet though)
cheers
--
Torsten