Seems like the guys at twitter are going down the pig/hadoop http://highscalability.com/blog/2010/2/19/twitters-plan-to-analyze-100-billion-tweets.html route could be worth getting them on board the Mahout wagon especially with previous discussion had about classification efforts http://old.nabble.com/Twitter-Classification-td27227638.html On 22 Feb 2010, at 12:13, Grant Ingersoll wrote:
> I'm all for Pig, especially once we are a TLP. I haven't had the proper time > to review the PLSI implementation, but it looks useful. I agree on the other > points, though, in that I think we it would be nice to have consistent > formats based on Vector so that things can be more portable. > > > On Feb 22, 2010, at 2:41 AM, Ankur C. Goel wrote: > >> Hi Folks, >> I would like to know how mahout community feels about having >> some of the Mahout algorithms implemented in pig - >> http://hadoop.apache.org/pig. The benefits of using Pig are many including. >> >> >> 1. Small learning curve, people with a bit of SQL knowledge will find it >> very easy. >> 2. Operations like grouping, aggregations, join need just few lines of pig >> code. >> 3. Insulation against Hadoop complexity - Job chains and JobConf. >> 4. Quick prototyping and hence increased programmer productivity. >> >> I had Sean's opinion on this and he was not too comfortable with the Idea of >> having things in different languages in Mahout. However, given the benefits >> of PIG, I feel otherwise. I may be biased here due to my own experience of >> being able to do more in lesser time in Pig then in M/R, so I thought let >> me ask how folks feel. >> >> Ted, I believe you have some PIG experience yourself so any thoughts on this >> ? >> >> Regards >> -...@nkur >