RE: Why hadoop is written in java?

Ricky Ho Tue, 12 Oct 2010 09:57:43 -0700

Is it easier if we change the question to : "Why does Java people create Hadoop 
before C++ people ?"

I agree that for framework like Hadoop, execution efficiency is at a higher 
priority than developer productivity.  And if the user can use any language to 
write map and reduce function (like Hadoop streaming), then we should use the 
most efficient language to write the core framework.

But again, don't forget the dynamics.  It is not about which language is the 
most efficient.  It is about within the group of parallel computing experts who 
is willing to spend time in Open source, what language are they more familiar 
with (or passionate about).

Rgds,
Ricky

-----Original Message-----
From: Chris Dyer [mailto:[email protected]] 
Sent: Monday, October 11, 2010 9:20 PM
To: [email protected]
Cc: [email protected]
Subject: Re: Why hadoop is written in java?

The Java memory overhead is a quite serious problem, and a legitimate
and serious criticism of Hadoop. For MapReduce applications, it is
often (although not always) possible to improve performance by doing
more work in memory (e.g., using combiners and the like) before
emitting data. Thus, the more memory available to your application,
the more efficient it runs. Therefore, if you have a framework that
locks up 500mb rather than 50mb, you systematically get less
performance out of your cluster.

The second issue is that C/C++ bindings are common and widely used
from many languages, but it is not generally possible to interface
directly with Java (or Java libraries) from another language, unless
that language is also built on top of the JVM. This is a very
unfortunate because many problems that would be quite naturally
expressed in MapReduce are better solved in non-JVM languages.

But, Java is what we have, and it works well enough for many things.

On Mon, Oct 11, 2010 at 11:18 PM, Dhruba Borthakur <[email protected]> wrote:
> I agree with others in this list that Java provides faster software
> development, the IO cost in Java is practically the same as in C/C++, etc.
> In short, most pieces of distributed software can be written in Java without
> any performance hiccups, as long as it is only system metadata that is
> handled by Java.
> 
> One problem is when data-flow has to occur in Java. Each record that is read
> from the storage has to be de-serialized, uncompressed and then processed.
> This processing could be very slow in Java compared to when written in other
> languages, especially because of the creation/destruction of too many
> objects.  It would have been nice if the map/reduce task could have been
> written in C/C++, or better still, if the sorting inside the MR framework
> could occur in C/C++.
> 
> thanks,
> dhruba
> 
> On Mon, Oct 11, 2010 at 4:50 PM, helwr <[email protected]> wrote:
> 
>> 
>> Check out this thread:
>> https://www.quora.com/Why-was-Hadoop-written-in-Java
>> --
>> View this message in context:
>>http://lucene.472066.n3.nabble.com/Why-hadoop-is-written-in-java-tp1673148p1684291.html
>>l
>> Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
>> 
> 
> 
> 
> --
> Connect to me at http://www.facebook.com/dhruba

RE: Why hadoop is written in java?

Reply via email to