I haven't been reading this list like I should... Pig is meant to provide a more powerful and simple abstraction for writing distributed processing logic than mapreduce by itself provides.
Power: - We support joins. - We provide the ability to select fields from records that will be passed to a function so that general functions can be written and reused. - We do some amount of optimization at the moment (more in the future) to reduce the number of actual jobs that get run. Simplicity: - The model has one kind of function: an eval function. It processes one record at a time. A dataset can have records grouped together or sorted or filtered or have a projection applied to it, but functions just need to work on one record at a time. If a=load 'dataset', MAP is foreach a generate map(*) and REDUCE is b=group a by $0; foreach b generate reduce(*) - Since Pig Latin is a simple language that can be used directly. We have a simple interpreter called GRUNT that users can interact with to submit jobs. - Eventually we would like to embed Pig Latin into Perl, Python and Ruby to create Erlpay, Ythonpay and Ubyray, but we are a bit low on developer bandwidth. We believe that by embedding Pig Latin into existing languages we would end up with a much more powerful, well know, and natural environment to work in as opposed to creating our own language like Sawzall. ben On Thursday 26 April 2007 15:17:24 Ian Holsman wrote: > Jim Kellerman wrote: > > Can someone comment on how Pig compares with Bigtable? > > > > On Thu, 2007-04-26 at 13:10 -0700, Doug Cutting wrote: > >> FYI > >> > >> http://research.yahoo.com/project/pig > >> > >> Doug > > my understanding is > > bigtable/hbase stores the data > mapreduce/hadoop manipulates/creates the data to be stored in bigtable > via functions, and controls the distribution > sawzall/pig is a query language to extract information from it. I think > it would use create functions for mapreduce/hadoop to run. > > regards > Ian
