Re: Comparison of Apache Pig Vs. Hadoop Streaming M/R

Harsh J Thu, 01 Mar 2012 23:08:56 -0800

On Fri, Mar 2, 2012 at 10:18 AM, Subir S <subir.sasiku...@gmail.com> wrote:
> Hello Folks,
>
> Are there any pointers to such comparisons between Apache Pig and Hadoop
> Streaming Map Reduce jobs?


I do not see why you seek to compare these two. Pig offers a language
that lets you write data-flow operations and runs these statements as
a series of MR jobs for you automatically (Making it a great tool to
use to get data processing done really quick, without bothering with
code), while streaming is something you use to write non-Java, simple
MR jobs. Both have their own purposes.

> Also there was a claim in our company that Pig performs better than Map
> Reduce jobs? Is this true? Are there any such benchmarks available

Pig _runs_ MR jobs. It does do job design (and some data)
optimizations based on your queries, which is what may give it an edge
over designing elaborate flows of plain MR jobs with tools like
Oozie/JobControl (Which takes more time to do). But regardless, Pig
only makes it easy doing the same thing with Pig Latin statements for
you.

-- 
Harsh J

Re: Comparison of Apache Pig Vs. Hadoop Streaming M/R

Reply via email to