So I created a jira-issue : https://issues.apache.org/jira/browse/MAHOUT-106 and also submitted a patch along with readme instructions. Please feel free to try out with different input samples. The default behaviour is to run pig in local mode. Appreciate any suggestions/reviews.
-Prasen On Wed, Feb 11, 2009 at 5:32 PM, Grant Ingersoll <[email protected]> wrote: > This is excellent, Prasen. > > I see no reason not to include them. We are about ML first, > distributed/scalable ML second and Hadoop-based third, IMO. Java would be a > distant fourth in my mind. In other words, I don't feel particularly strong > about us being Java only or even Hadoop only. To me there is a significant > need for community-developed machine learning capabilities with a commercial > friendly license. Add in the ability to scale/run efficiently and you have > a home run. In fact, those are the very reasons we founded Mahout. > > > On Feb 11, 2009, at 6:40 AM, prasenjit mukherjee wrote: > >> Pig is a higher level language ( more like Swazall for Google's >> mapreduce ) on top of hadoop which makes hadoop easy to use. >> >> It has SQL like syntaxes and can break the command into separate >> mapreduce tasks and also chain them. From execution point of view they >> are as simple as running a shell script with very few >> operators/commands. >> >> Some of its commands are join, group, cogroup, load etc. >> >> For example the following pig script takes a logfile in the format : >> <txid>,<txt>,<user> and outputs user-term-freq file in the foll >> format : <txt>\t<user>\t<cnt> >> >> raw = load 'tx_log.csv' using PigStorage(',') AS >> (transactionid:chararray, txt:chararray, user:chararray); >> tokenized = FOREACH raw GENERATE user, flatten(TOKENIZE(txt)) as >> attribute; >> user_term_freq = group tokenized by (user,attribute); >> user_term_freq = foreach ratings generate flatten(group),COUNT(tokenized); >> store ratings into 'user_term_freq.txt'; >> >> During runtime pig takes the input and breaks it into several map and >> reduce tasks. It takes the hadoop-site.xml from its classpath. >> >> -Prasen >> >> On Wed, Feb 11, 2009 at 4:54 PM, Sean Owen <[email protected]> wrote: >>> >>> Needs to go somewhere like trunk/core/src/pig/main right, versus /java/ ? >>> >>> I also see no harm in adding it, other than that it would remain >>> pretty isolated right? isn't part of the build, can't be integrated >>> with the other code, etc.? Does it add value to package it with the >>> project then? >>> >>> Perhaps I misunderstand what Pig can do or how it can relate to Java? >>> >>> On Wed, Feb 11, 2009 at 11:13 AM, Grant Ingersoll <[email protected]> >>> wrote: >>>> >>>> Hmm, hadn't really thought about it, but I see no reason why we wouldn't >>>> accept it and add it. I think our source tree can definitely handle it. >>>> >>>> I'd propose it go somewhere under: >>>> trunk/core/src/main/pig/plsi >>>> >>>> I'm not familiar with Pig, but I can learn, and I know others are, is it >>>> a >>>> single file? >>>> >>>> See http://cwiki.apache.org/MAHOUT/howtocontribute.html for instructions >>>> on >>>> contributing. Basically, just attach the file(s) to a JIRA issue. >>>> >>>> On Feb 11, 2009, at 2:18 AM, prasenjit mukherjee wrote: >>>> >>>>> Hi, >>>>> I have implemented hofmann's plsi/em algo in pig which I would like >>>>> to contribute back to the community for further >>>>> scrutinization/improvement. Let me know if mahout is the appropriate >>>>> forum or should it go to pig project. >>>>> >>>>> Haven't seen any non-java contributions to Mahout yet, which begs the >>>>> question is Mahout only java based ? >>>>> >>>>> -Thanks, >>>>> Prasen >>>> >>>> -------------------------- >>>> Grant Ingersoll >>>> http://www.lucidimagination.com/ >>>> >>>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using >>>> Solr/Lucene: >>>> http://www.lucidimagination.com/search >>>> >>>> >>> > > -------------------------- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using > Solr/Lucene: > http://www.lucidimagination.com/search > >
