See here: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.13.8572&rank=1
On Tue, Jun 28, 2011 at 2:43 PM, Sebastian Schelter (JIRA) <[email protected]>wrote: > > [ > https://issues.apache.org/jira/browse/MAHOUT-746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056805#comment-13056805] > > Sebastian Schelter commented on MAHOUT-746: > ------------------------------------------- > > Thank you very much, Sean. > > I wonder whether there is some article/paper that describes this particular > approach of implementing Naive Bayes? A colleague of mine with a much deeper > statistics background and me took a look at the details of the computation > today and we were left with some open questions. > > > Refactoring of the parallel Naive Bayes implementation in > org.apache.mahout.classifier.naivebayes > > > ------------------------------------------------------------------------------------------------- > > > > Key: MAHOUT-746 > > URL: https://issues.apache.org/jira/browse/MAHOUT-746 > > Project: Mahout > > Issue Type: Improvement > > Components: Classification > > Affects Versions: 0.6 > > Reporter: Sebastian Schelter > > Assignee: Sebastian Schelter > > Fix For: 0.6 > > > > Attachments: MAHOUT-746.patch > > > > > > I refactored the code in org.apache.mahout.classifier.naivebayes to > extend AbstractJob, decoupled the model serialization from the job output, > extracted trainer classes and tried to clarify naming and reduce code > complexity. I also added tests for the training M/R code as well as a toy > integration test. > > It would be great if someone could review my patch to make sure I didn't > break anything. > > -- > This message is automatically generated by JIRA. > For more information on JIRA, see: http://www.atlassian.com/software/jira > > >
