I would say biggest difference between a C-Hadoop and Java-Hadoop would be memory usage on Namenode (and memory allocation related cpu benifits). Rest of the nodes on the cluster would perform about the same. C is more suitable for low level memory optimizations (both in overall size and number of allocations). In my guestimate, Namenode could use 30-40% less memory.
Note that memory is an issue mainly for very large clusters. Raghu. Steve Schlosser wrote:
Please excuse a possibly heretical question... My colleagues and I have been working with Hadoop lately, and I keep getting asked the same question: what is the performance impact of having the system written in Java? Some folks, even today, are suspicious of the overhead of Java, especially for systems programming. I realize that an apples-to-apples comparison of Java-based Hadoop to, say, C-based Hadoop is entirely out of the question, but I was wondering if anyone has a reasonable qualitative answer that I can pass on when people ask. Thanks! -steve
