On Wed, Jun 29, 2011 at 3:33 AM, Ted Dunning <[email protected]> wrote:
> Hmmm... not sure. I thought they were all the same. It is possible there > is a left-over implementation. > > Robin? Care to comment? > Didnt see the thread. Both are based on same math. naivebayes one uses vectors instead of text > > On Tue, Jun 28, 2011 at 3:01 PM, Sebastian Schelter <[email protected]> > wrote: > > > Is org.apache.mahout.classifier.**naivebayes also based on that one? I > > thought it was only relevant for org.apache.mahout.classifier.**bayes? > > > > > > On 28.06.2011 23:58, Ted Dunning wrote: > > > >> See here: > >> http://citeseerx.ist.psu.edu/**viewdoc/summary?doi=10.1.1.13.** > >> 8572&rank=1< > http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.13.8572&rank=1> > >> > >> On Tue, Jun 28, 2011 at 2:43 PM, Sebastian Schelter (JIRA) > >> <[email protected]>wrote: > >> > >> > >>> [ > >>> https://issues.apache.org/**jira/browse/MAHOUT-746?page=** > >>> com.atlassian.jira.plugin.**system.issuetabpanels:comment-** > >>> tabpanel&focusedCommentId=**13056805#comment-13056805< > https://issues.apache.org/jira/browse/MAHOUT-746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056805#comment-13056805 > > > >>> ] > >>> > >>> Sebastian Schelter commented on MAHOUT-746: > >>> ------------------------------**------------- > >>> > >>> Thank you very much, Sean. > >>> > >>> I wonder whether there is some article/paper that describes this > >>> particular > >>> approach of implementing Naive Bayes? A colleague of mine with a much > >>> deeper > >>> statistics background and me took a look at the details of the > >>> computation > >>> today and we were left with some open questions. > >>> > >>> Refactoring of the parallel Naive Bayes implementation in > >>>> > >>> org.apache.mahout.classifier.**naivebayes > >>> > >>>> > >>>> ------------------------------**------------------------------** > >>> ------------------------------**------- > >>> > >>>> > >>>> Key: MAHOUT-746 > >>>> URL: > https://issues.apache.org/**jira/browse/MAHOUT-746< > https://issues.apache.org/jira/browse/MAHOUT-746> > >>>> Project: Mahout > >>>> Issue Type: Improvement > >>>> Components: Classification > >>>> Affects Versions: 0.6 > >>>> Reporter: Sebastian Schelter > >>>> Assignee: Sebastian Schelter > >>>> Fix For: 0.6 > >>>> > >>>> Attachments: MAHOUT-746.patch > >>>> > >>>> > >>>> I refactored the code in org.apache.mahout.classifier.**naivebayes to > >>>> > >>> extend AbstractJob, decoupled the model serialization from the job > >>> output, > >>> extracted trainer classes and tried to clarify naming and reduce code > >>> complexity. I also added tests for the training M/R code as well as a > toy > >>> integration test. > >>> > >>>> It would be great if someone could review my patch to make sure I > didn't > >>>> > >>> break anything. > >>> > >>> -- > >>> This message is automatically generated by JIRA. > >>> For more information on JIRA, see: http://www.atlassian.com/** > >>> software/jira <http://www.atlassian.com/software/jira> > >>> > >>> > >>> > >>> > >> > > >
