On Fri, Mar 2, 2012 at 10:18 AM, Subir S <subir.sasiku...@gmail.com> wrote: > Hello Folks, > > Are there any pointers to such comparisons between Apache Pig and Hadoop > Streaming Map Reduce jobs?
I do not see why you seek to compare these two. Pig offers a language that lets you write data-flow operations and runs these statements as a series of MR jobs for you automatically (Making it a great tool to use to get data processing done really quick, without bothering with code), while streaming is something you use to write non-Java, simple MR jobs. Both have their own purposes. > Also there was a claim in our company that Pig performs better than Map > Reduce jobs? Is this true? Are there any such benchmarks available Pig _runs_ MR jobs. It does do job design (and some data) optimizations based on your queries, which is what may give it an edge over designing elaborate flows of plain MR jobs with tools like Oozie/JobControl (Which takes more time to do). But regardless, Pig only makes it easy doing the same thing with Pig Latin statements for you. -- Harsh J
