Has anyone benchmark the performance difference of using Hadoop ?
  1) Java vs C++
  2) Java vs Streaming

>From looking at the Hadoop architecture, since TaskTracker will fork a 
>separate process anyway to run the user supplied map() and reduce() function, 
>I don't see the performance overhead of using Hadoop Streaming (of course the 
>efficiency of the chosen script will be a factor but I think this is 
>orthogonal).  On the other hand, I see a lot of benefits of using Streaming, 
>including ...

  1) I can pick the language that offers a different programming paradigm (e.g. 
I may choose functional language, or logic programming if they suit the problem 
better).  In fact, I can even chosen Erlang at the map() and Prolog at the 
reduce().  Mix and match can optimize me more.
  2) I can pick the language that I am familiar with, or one that I like.
  3) Easy to switch to another language in a fine-grain incremental way if I 
choose to do so in future.

Even if I am a Java programmer, I still can write a Main() method to take the 
standard in and standard out data and I don't see I am losing much by doing 
that.  The benefit is my code can be easily moved to another language in future.

Am I missing something here ?  or is the majority of Hadoop applications 
written in Hadoop Streaming ?

Rgds,
Ricky

Reply via email to