Thanks all, and Charles you guided me to Baidu slides titled: Introduction to *Hadoop C++ Extension*<http://hic2010.hadooper.cn/dct/attach/Y2xiOmNsYjpwZGY6ODI5> which is their experience and the sixth-slide shows exactly what I was looking for. It is still hard to manage memory with pipes besides the no performance gains, hence the advancement of HCE.
Thanks, Mark On Thu, Apr 5, 2012 at 2:23 PM, Charles Earl <charles.ce...@gmail.com>wrote: > Also bear in mind that there is a kind of detour involved, in the sense > that a pipes map must send key,value data back to the Java process and then > to reduce (more or less). > I think that the Hadoop C Extension (HCE, there is a patch) is supposed to > be faster. > Would be interested to know if the community has any experience with HCE > performance. > C > > On Apr 5, 2012, at 3:49 PM, Robert Evans <ev...@yahoo-inc.com> wrote: > > > Both streaming and pipes do very similar things. They will fork/exec a > separate process that is running whatever you want it to run. The JVM that > is running hadoop then communicates with this process to send the data over > and get the processing results back. The difference between streaming and > pipes is that streaming uses stdin/stdout for this communication so > preexisting processing like grep, sed and awk can be used here. Pipes uses > a custom protocol with a C++ library to communicate. The C++ library is > tagged with SWIG compatible data so that it can be wrapped to have APIs in > other languages like python or perl. > > > > I am not sure what the performance difference is between the two, but in > my own work I have seen a significant performance penalty from using either > of them, because there is a somewhat large overhead of sending all of the > data out to a separate process just to read it back in again. > > > > --Bobby Evans > > > > > > On 4/5/12 1:54 PM, "Mark question" <markq2...@gmail.com> wrote: > > > > Hi guys, > > quick question: > > Are there any performance gains from hadoop streaming or pipes over > > Java? From what I've read, it's only to ease testing by using your > favorite > > language. So I guess it is eventually translated to bytecode then > executed. > > Is that true? > > > > Thank you, > > Mark > > >