[ 
https://issues.apache.org/jira/browse/MAHOUT-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13957242#comment-13957242
 ] 

Andrew Palumbo edited comment on MAHOUT-1369 at 4/2/14 2:01 AM:
----------------------------------------------------------------

(edit: sorry- just noticed that the first comment explicitly notes that NB was 
newly implemented in .7) 
In the pre .7 versions TF-IDF transformations are done internally to NB.  After 
.7 the algorithm is relaxed and the transformations are done externally (eg. 
via seq2sparse) . It looks like the weight (Theta) normalization was never 
properly implemented after that move.  It should be a relatively easy fix and 
will allow for all 4 flavors of the NB algorithm from the Reinne paper.

If there are no objections, as this is a question JIRA, I'll create 3 new JIRAs 
related to NB:

          1.  Update website to current NB specs (Current is for pre .7) 
          2.  Fix Theta-Normalization problem
          3.  Address the error reported on the dev list last week by Chandler 
Burgess re: testnb failing in sequential mode




was (Author: andrew_palumbo):
Going back and looking at the mahout .5 and .6 releases, it looks like there 
were some major changes to the Naive Bayes implementation between .6/.7.  NB 
seems to have been completely refactored/rewritten.   In the pre .7 versions 
TF-IDF transformations are done internally to NB.  After .7 the algorithm is 
relaxed and the transformations are done externally (eg. via seq2sparse) . It 
looks like the weight (Theta) normalization was never properly implemented 
after that move.  It should be a relatively easy fix and will allow for all 4 
flavors of the NB algorithm from the Reinne paper.

If there are no objections, as this is a question JIRA, I'll create 3 new JIRAs 
related to NB:

          1.  Update website to current NB specs (Current is for pre .7) 
          2.  Fix Theta-Normalization problem
          3.  Address the error reported on the dev list last week by Chandler 
Burgess re: testnb failing in sequential mode



> Why is theta normalization for naive bayes classification commented out?
> ------------------------------------------------------------------------
>
>                 Key: MAHOUT-1369
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1369
>             Project: Mahout
>          Issue Type: Question
>          Components: Classification
>    Affects Versions: 0.7, 0.8, 0.9
>         Environment: mahout 0.8
>            Reporter: utku yaman
>            Priority: Minor
>              Labels: features
>             Fix For: 1.0
>
>
> TrainNaiveBayesJob line 155:158
> and
> BayesUtils line 86:93
> are commented out and these lines are for theta normalization for bayes.
> what is the problem with the code and is there a plan for correcting these 
> methods.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to