Hadoop has a lot of inefficiencies in it still.
Most of them are not related to the language choice.
If you look at what the per node tasks are doing (as opposed to the
name node and job tracker) you will see that very little real work is
being done by Hadoop Java code.
Pumbing bytes / io is done in library calls that are native code.
The data node should be able to do its work in a fraction of a nodes
CPU resource.
Your inner loop map-reduce code can be coded in C++ if you prefer.
So I don't think the choice of Java represents a real performance
hurtle.
Depending on your work load, the use of hadoop may represent a huge
performance gain or loss. You will need to benchmark it against your
needs. But I am not be too concerned about Java and we are running
thousands of servers.
E14
On Sep 6, 2007, at 1:37 AM, Torsten Curdt wrote:
On 06.09.2007, at 09:56, Pietu Pohjalainen wrote:
> Jeroen Verhagen wrote:
>> On 9/5/07, Steve Schlosser <[EMAIL PROTECTED]> wrote:
>>
>>> question, but I was wondering if anyone has a reasonable
qualitative
>>> answer that I can pass on when people ask.
>>>
>> Is this question really relevant since Hadoop is designed to run
on a
>> cluster of commodity hardware Google-style? If there were any
>> difference I'm sure it would be solved by adding 1 machine to the
>> cluster.
>>
>
>
> Isn't it about whether to add 30% or 50% more machines? Which is
> starting to get significant when you think whether to have 1000 or
> 1500 machines.
A plain java vs <some language> discussion is way to simple. I've
been working on a java project that way (!!) out-performed a similar C
++ project. The design and a smart implementation will make more
difference that just the plain language. Long running vs short
running ..all what has already been said. At least that's my
experience. That being said, for hadoop the one-child-jvm-per-job is
what has quite a bit of an overhead. If you are not scared that your
jobs will tear down your tasktrackers - we have an in-jvm execution
patch. (not submitted yet though)
cheers
--
Torsten