[jira] [Comment Edited] (MAHOUT-1493) Port Naive Bayes to the Spark DSL

Andrew Palumbo (JIRA) Wed, 16 Jul 2014 13:39:36 -0700

    [ 
https://issues.apache.org/jira/browse/MAHOUT-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14064027#comment-14064027
 ]


Andrew Palumbo edited comment on MAHOUT-1493 at 7/16/14 8:37 PM:
-----------------------------------------------------------------

[~cviebig], The patch looks good. I've made some edits (against your 
develop/issue/1493/4 branch) and will attach a M-1493a.patch shortly.  I put 
the trainComplementary parameter back in as this is needed to make the 
distinction between Standard and Complementary Models in the NaiveBayesModel 
constructor.  

As well, I've added a thetaNormalizer var which can remain null when passed to 
the NaiveBayesModel constructor unless training a Complementary NB model. see:
 
https://github.com/apache/mahout/blob/master/mrlegacy/src/main/java/org/apache/mahout/classifier/naivebayes/NaiveBayesModel.java

I'm not sure if creating a null var as I've done here is best practice in 
scala, but i wanted to give you an idea of the NaiveBayesModel design.  

As you've noted, there has been a lot of refactoring going on.  As far as 
moving the code, I think that for now it might be a good idea to keep this in 
`spark` module, and move the 
`org.apache.mahout.sparkbindings.drm.classification` package out of 
`org.apache.mahout.sparkbindings.drm` and into a new 
`org.apache.mahout.classification` package.  I believe that for now this would 
be a good place for it.  There shouldn't be any need to move any of the java 
code from mrlegacy.


 




was (Author: andrew_palumbo):
[~cviebig], The patch looks good. I've made some edits (against your 
develop/issue/1493/4 branch) and will attach a MAHOUT-1493a.patch shortly.  I 
put the trainComplementary parameter back in as this is needed to make the 
distinction between Standard and Complementary Models in the NaiveBayesModel 
constructor.  

As well, I've added a thetaNormalizer var which can remain null when passed to 
the NaiveBayesModel constructor unless training a Complementary NB model. see:
 
https://github.com/apache/mahout/blob/master/mrlegacy/src/main/java/org/apache/mahout/classifier/naivebayes/NaiveBayesModel.java

I'm not sure if creating a null var as I've done here is best practice in 
scala, but i wanted to give you an idea of the NaiveBayesModel design.  

As you've noted, there has been a lot of refactoring going on.  As far as 
moving the code, I think that for now it might be a good idea to keep this in 
`spark` module, and move the 
`org.apache.mahout.sparkbindings.drm.classification` package out of 
`org.apache.mahout.sparkbindings.drm` and into a new 
`org.apache.mahout.classification` package.  I believe that for now this would 
be a good place for it.  There shouldn't be any need to move any of the java 
code from mrlegacy.


 



> Port Naive Bayes to the Spark DSL
> ---------------------------------
>
>                 Key: MAHOUT-1493
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1493
>             Project: Mahout
>          Issue Type: Bug
>          Components: Classification
>            Reporter: Sebastian Schelter
>            Assignee: Sebastian Schelter
>             Fix For: 1.0
>
>         Attachments: MAHOUT-1493.patch, MAHOUT-1493.patch, MAHOUT-1493.patch, 
> MAHOUT-1493.patch, MAHOUT-1493a.patch
>
>
> Port our Naive Bayes implementation to the new spark dsl. Shouldn't require 
> more than a few lines of code.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (MAHOUT-1493) Port Naive Bayes to the Spark DSL

Reply via email to