[ 
https://issues.apache.org/jira/browse/MAHOUT-746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056805#comment-13056805
 ] 

Sebastian Schelter commented on MAHOUT-746:
-------------------------------------------

Thank you very much, Sean. 

I wonder whether there is some article/paper that describes this particular 
approach of implementing Naive Bayes? A colleague of mine with a much deeper 
statistics background and me took a look at the details of the computation 
today and we were left with some open questions.

> Refactoring of the parallel Naive Bayes implementation in 
> org.apache.mahout.classifier.naivebayes
> -------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-746
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-746
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Classification
>    Affects Versions: 0.6
>            Reporter: Sebastian Schelter
>            Assignee: Sebastian Schelter
>             Fix For: 0.6
>
>         Attachments: MAHOUT-746.patch
>
>
> I refactored the code in org.apache.mahout.classifier.naivebayes to extend 
> AbstractJob, decoupled the model serialization from the job output, extracted 
> trainer classes and tried to clarify naming and reduce code complexity. I 
> also added tests for the training M/R code as well as a toy integration test.
> It would be great if someone could review my patch to make sure I didn't 
> break anything.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to