On 12/10/10 05:20, Chris Dyer wrote:
The Java memory overhead is a quite serious problem, and a legitimate
and serious criticism of Hadoop. For MapReduce applications, it is
often (although not always) possible to improve performance by doing
more work in memory (e.g., using combiners and the like) before
emitting data. Thus, the more memory available to your application,
the more efficient it runs. Therefore, if you have a framework that
locks up 500mb rather than 50mb, you systematically get less
performance out of your cluster.

The second issue is that C/C++ bindings are common and widely used
from many languages, but it is not generally possible to interface
directly with Java (or Java libraries) from another language, unless
that language is also built on top of the JVM. This is a very
unfortunate because many problems that would be quite naturally
expressed in MapReduce are better solved in non-JVM languages.

A few years back I went from a java project to 6 months doing something in C/C++.

First it was like rediscovering stuff: mixins! ability to overwrite operators! STL!

Then you start looking at the build and test process, and think "this hasn't moved on for a while", then struggling with CppUnit to do test-first development of COM service, setting up Cruise Control to run a build.xml that just <execs> visual studio's build to build your app, then you run the tests. Eventually, the tests worked.

But then there was the memory leaks, the reference counter problems, the threading and race conditions issues, the inconsistency between windows and linux. And the string types. Oh, so many string types. char*, TCHAR*, LPCSTR, BSTR, etc.

In Java, you have to go out of your way for a memory leak, so if your tests work, your code is functional and good to ship. But in C/C++, the engineering to go from code that passes its functional tests and code that doesn't leak memory, is thread safe and secure is way harder.

Try representing a large graph in C++ that is shared across threads and not have memory problems to see what I mean.

I agree, some Java independence would be nice, but I'd go higher, towards more graph and list centric languages, not closer to the metal.

Scala support, anyone?

Reply via email to