Java's features such as garbage collection, run time array index checking, cleaner syntax (no pointers) make it a good language for Hadoop. One can develop MapReduce apps faster and maintain code easier than in case of C/C++, allowing clients to focus on their business logic/use cases.
For a fairly high level implementation of MapReduce which uses clusters of COTS hardware as compute nodes, the main bottleneck in most applications will be due to network I/O. In such cases, the speed advantage of C/C++ over Java seems less attractive. You will be doing more work shuffling packets around anyway. C/C++ applications are difficult to port, and are too system specific. Let's say you are trying to optimize a certain portion of your mapper code by pointer manipulations. Such operations are inherently error prone because of their proximity to the hardware. JVM alleviates most of these issues, you don't have to think about what is the number of bytes for a double, your code will be portable across 32 bit or 64 bit architectures, across all endian systems etc. Even with Java's safety and comfort, debugging distributed Hadoop MapReduce apps are a pain in the butt. Just imagine what would happen if you had C/C++ where you are buried in Seg Faults. I would say that you can use C/C++ to implement MapReduce, if you were using multicore/GPU's as your underlying platform where you know the hardware initimately and are free from network I/O latency. -Dhruv Kumar On Tue, Aug 16, 2011 at 12:05 PM, Bill Graham <[email protected]> wrote: > There was a fairly long discussion on this topic at the beginning of the > year FYI: > > http://search-hadoop.com/m/JvSQe2wNlY11 > > On Mon, Aug 15, 2011 at 9:00 PM, Chris Song <[email protected]> wrote: > > > Why hadoop should be built in JAVA? > > > > For integrity and stability, it is good for hadoop to be implemented in > > Java > > > > But, when it comes to speed issue, I have a question... > > > > How will it be if HADOOP is implemented in C or Phython? > > >
