I too have mixed opinion w.r.t pig. Pig would be a good choice to
quickly prototype and test. However, following are the pitfalls I have
observed in pig.

It is not easy to debug in pig. Also, it have performance issues as it
is a layer on top of hadoop, so the overhead of converting pig into
map-reduce code. Also, when the code is available in hadoop, it is in
developer/user's hand to improve the performance by using various
parameters say, no of mappers, different input formats, etc. However is
not the case with pig. Also,there are some compatibility issues with pig
and hadoop. Say, if I am using pig-x version on hadoop-y version, there
might be some compatibility issues and need to spend time on resolving
the same as it is not easy to figure out the errors. 
I believe the main motto of mahout is to propose scalable algorithms
which can be used to solve some real world problems. In such case, if
pig has got rid of above pitfalls, then it would be good choice as we
will have very less developing time efforts. 

Thanks
Pallavi

-----Original Message-----
From: Ted Dunning [mailto:ted.dunn...@gmail.com] 
Sent: Monday, February 22, 2010 11:32 PM
To: mahout-dev@lucene.apache.org
Subject: Re: Algorithm implementations in Pig

As an interesting test case, can you write a pig program that counts
words.

BUT, it takes an input file name AND an input field name.

On Mon, Feb 22, 2010 at 9:56 AM, Ted Dunning <ted.dunn...@gmail.com>
wrote:

>
> That isn't an issue here.  It is the invocation of pig programs and 
> passing useful information to them that is the problem.
>
>
> On Mon, Feb 22, 2010 at 9:20 AM, Ankur C. Goel
<gan...@yahoo-inc.com>wrote:
>
>> Scripting ability while still limited has better streaming support so

>> you can have relations streamed Into a custom script executing in 
>> either map or reduce phase depending upon where it is placed.
>>
>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>
>


--
Ted Dunning, CTO
DeepDyve

Reply via email to