Nevermind, I'll take a deeper look into that paper :)
On 29.06.2011 00:03, Ted Dunning wrote:
Hmmm... not sure. I thought they were all the same. It is possible there is a left-over implementation. Robin? Care to comment? On Tue, Jun 28, 2011 at 3:01 PM, Sebastian Schelter <[email protected] <mailto:[email protected]>> wrote: Is org.apache.mahout.classifier.__naivebayes also based on that one? I thought it was only relevant for org.apache.mahout.classifier.__bayes? On 28.06.2011 23:58, Ted Dunning wrote: See here: http://citeseerx.ist.psu.edu/__viewdoc/summary?doi=10.1.1.13.__8572&rank=1 <http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.13.8572&rank=1> On Tue, Jun 28, 2011 at 2:43 PM, Sebastian Schelter (JIRA) <[email protected] <mailto:[email protected]>>wrote: [ https://issues.apache.org/__jira/browse/MAHOUT-746?page=__com.atlassian.jira.plugin.__system.issuetabpanels:comment-__tabpanel&focusedCommentId=__13056805#comment-13056805 <https://issues.apache.org/jira/browse/MAHOUT-746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056805#comment-13056805>] Sebastian Schelter commented on MAHOUT-746: ------------------------------__------------- Thank you very much, Sean. I wonder whether there is some article/paper that describes this particular approach of implementing Naive Bayes? A colleague of mine with a much deeper statistics background and me took a look at the details of the computation today and we were left with some open questions. Refactoring of the parallel Naive Bayes implementation in org.apache.mahout.classifier.__naivebayes ------------------------------__------------------------------__------------------------------__------- Key: MAHOUT-746 URL: https://issues.apache.org/__jira/browse/MAHOUT-746 <https://issues.apache.org/jira/browse/MAHOUT-746> Project: Mahout Issue Type: Improvement Components: Classification Affects Versions: 0.6 Reporter: Sebastian Schelter Assignee: Sebastian Schelter Fix For: 0.6 Attachments: MAHOUT-746.patch I refactored the code in org.apache.mahout.classifier.__naivebayes to extend AbstractJob, decoupled the model serialization from the job output, extracted trainer classes and tried to clarify naming and reduce code complexity. I also added tests for the training M/R code as well as a toy integration test. It would be great if someone could review my patch to make sure I didn't break anything. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/__software/jira <http://www.atlassian.com/software/jira>
