On Tue, Oct 12, 2010 at 12:20 AM, Chris Dyer <[email protected]> wrote: > The Java memory overhead is a quite serious problem, and a legitimate > and serious criticism of Hadoop. For MapReduce applications, it is > often (although not always) possible to improve performance by doing > more work in memory (e.g., using combiners and the like) before > emitting data. Thus, the more memory available to your application, > the more efficient it runs. Therefore, if you have a framework that > locks up 500mb rather than 50mb, you systematically get less > performance out of your cluster. > > The second issue is that C/C++ bindings are common and widely used > from many languages, but it is not generally possible to interface > directly with Java (or Java libraries) from another language, unless > that language is also built on top of the JVM. This is a very > unfortunate because many problems that would be quite naturally > expressed in MapReduce are better solved in non-JVM languages. > > But, Java is what we have, and it works well enough for many things. > > On Mon, Oct 11, 2010 at 11:18 PM, Dhruba Borthakur <[email protected]> wrote: >> I agree with others in this list that Java provides faster software >> development, the IO cost in Java is practically the same as in C/C++, etc. >> In short, most pieces of distributed software can be written in Java without >> any performance hiccups, as long as it is only system metadata that is >> handled by Java. >> >> One problem is when data-flow has to occur in Java. Each record that is read >> from the storage has to be de-serialized, uncompressed and then processed. >> This processing could be very slow in Java compared to when written in other >> languages, especially because of the creation/destruction of too many >> objects. It would have been nice if the map/reduce task could have been >> written in C/C++, or better still, if the sorting inside the MR framework >> could occur in C/C++. >> >> thanks, >> dhruba >> >> On Mon, Oct 11, 2010 at 4:50 PM, helwr <[email protected]> wrote: >> >>> >>> Check out this thread: >>> https://www.quora.com/Why-was-Hadoop-written-in-Java >>> -- >>> View this message in context: >>> http://lucene.472066.n3.nabble.com/Why-hadoop-is-written-in-java-tp1673148p1684291.html >>> Sent from the Hadoop lucene-users mailing list archive at Nabble.com. >>> >> >> >> >> -- >> Connect to me at http://www.facebook.com/dhruba >> >
Hate to say it this way... but yet another "java is slow compared to the equivalent non existent c/c++ alternative" Until http://code.google.com/p/qizmt/ wins the TeraSort benchmark or when Google open sources Google MapReduce, I am sure if someone coded hadoop in assembler it would trump the theoretical hadoop written in c as well.
