Is it easier if we change the question to : "Why does Java people create Hadoop before C++ people ?"
I agree that for framework like Hadoop, execution efficiency is at a higher priority than developer productivity. And if the user can use any language to write map and reduce function (like Hadoop streaming), then we should use the most efficient language to write the core framework. But again, don't forget the dynamics. It is not about which language is the most efficient. It is about within the group of parallel computing experts who is willing to spend time in Open source, what language are they more familiar with (or passionate about). Rgds, Ricky -----Original Message----- From: Chris Dyer [mailto:[email protected]] Sent: Monday, October 11, 2010 9:20 PM To: [email protected] Cc: [email protected] Subject: Re: Why hadoop is written in java? The Java memory overhead is a quite serious problem, and a legitimate and serious criticism of Hadoop. For MapReduce applications, it is often (although not always) possible to improve performance by doing more work in memory (e.g., using combiners and the like) before emitting data. Thus, the more memory available to your application, the more efficient it runs. Therefore, if you have a framework that locks up 500mb rather than 50mb, you systematically get less performance out of your cluster. The second issue is that C/C++ bindings are common and widely used from many languages, but it is not generally possible to interface directly with Java (or Java libraries) from another language, unless that language is also built on top of the JVM. This is a very unfortunate because many problems that would be quite naturally expressed in MapReduce are better solved in non-JVM languages. But, Java is what we have, and it works well enough for many things. On Mon, Oct 11, 2010 at 11:18 PM, Dhruba Borthakur <[email protected]> wrote: > I agree with others in this list that Java provides faster software > development, the IO cost in Java is practically the same as in C/C++, etc. > In short, most pieces of distributed software can be written in Java without > any performance hiccups, as long as it is only system metadata that is > handled by Java. > > One problem is when data-flow has to occur in Java. Each record that is read > from the storage has to be de-serialized, uncompressed and then processed. > This processing could be very slow in Java compared to when written in other > languages, especially because of the creation/destruction of too many > objects. It would have been nice if the map/reduce task could have been > written in C/C++, or better still, if the sorting inside the MR framework > could occur in C/C++. > > thanks, > dhruba > > On Mon, Oct 11, 2010 at 4:50 PM, helwr <[email protected]> wrote: > >> >> Check out this thread: >> https://www.quora.com/Why-was-Hadoop-written-in-Java >> -- >> View this message in context: >>http://lucene.472066.n3.nabble.com/Why-hadoop-is-written-in-java-tp1673148p1684291.html >>l >> Sent from the Hadoop lucene-users mailing list archive at Nabble.com. >> > > > > -- > Connect to me at http://www.facebook.com/dhruba
