Re: Algorithm implementations in Pig

David Stuart Mon, 22 Feb 2010 04:33:06 -0800

Seems like the guys at twitter are going down the pig/hadoop 
http://highscalability.com/blog/2010/2/19/twitters-plan-to-analyze-100-billion-tweets.html
 route could be worth getting them on board the Mahout wagon especially with 
previous discussion had about classification efforts 
http://old.nabble.com/Twitter-Classification-td27227638.html
On 22 Feb 2010, at 12:13, Grant Ingersoll wrote:


> I'm all for Pig, especially once we are a TLP.  I haven't had the proper time 
> to review the PLSI implementation, but it looks useful.  I agree on the other 
> points, though, in that I think we it would be nice to have consistent 
> formats based on Vector so that things can be more portable.
> 
> 
> On Feb 22, 2010, at 2:41 AM, Ankur C. Goel wrote:
> 
>> Hi Folks,
>>              I would like to know how mahout community feels about having 
>> some of the Mahout algorithms implemented in pig - 
>> http://hadoop.apache.org/pig. The benefits of using Pig are many including.
>> 
>> 
>> 1.  Small learning curve, people with a bit of SQL knowledge will find it 
>> very easy.
>> 2.  Operations like grouping, aggregations, join need just few lines of pig 
>> code.
>> 3.  Insulation against Hadoop complexity - Job chains and JobConf.
>> 4.  Quick prototyping and hence increased programmer productivity.
>> 
>> I had Sean's opinion on this and he was not too comfortable with the Idea of 
>> having things in different languages in Mahout. However, given the benefits 
>> of PIG, I feel otherwise. I may be biased here due to my own experience of 
>> being able to do more in lesser time in Pig then in  M/R, so I thought let 
>> me ask how folks feel.
>> 
>> Ted, I believe you have some PIG experience yourself so any thoughts on this 
>> ?
>> 
>> Regards
>> -...@nkur
>

Re: Algorithm implementations in Pig

Reply via email to