[
https://issues.apache.org/jira/browse/MAHOUT-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13957242#comment-13957242
]
Andrew Palumbo edited comment on MAHOUT-1369 at 4/2/14 1:52 AM:
----------------------------------------------------------------
Going back and looking at the mahout .5 and .6 releases, it looks like there
were some major changes to the Naive Bayes implementation between .6/.7. NB
seems to have been completely refactored/rewritten. In the pre .7 versions
TF-IDF transformations are done internally to NB. After .7 the algorithm is
relaxed and the transformations are done externally (eg. via seq2sparse) . It
looks like the weight (Theta) normalization was never properly implemented
after that move. It should be a relatively easy fix and will allow for all 4
flavors of the NB algorithm from the Reinne paper.
If there are no objections, as this is a question JIRA, I'll create 3 new JIRAs
related to NB:
1. Update website to current NB specs (Current is for pre .7)
2. Fix Theta-Normalization problem
3. Address the error reported on the dev list last week by Chandler
Burgess re: testnb failing in sequential mode
was (Author: andrew_palumbo):
Going back and looking at the mahout .5 and .6 releases, it looks like there
were some major changes to the Naive Bayes implementation between .6/.7. NB
seems to have been completely refactored/rewritten. In the pre .7 versions
TF-IDF transformations are done internally to NB. After .7 the algorithm is
relaxed and the transformations are done externally (eg. via seq2sparse) . It
looks like the weight (Theta) normalization was never properly implemented
after that move. It should be a relatively easy fix and will allow for all 4
flavors of the NB algorithm from the Reinne paper.
If there are no objections, as this is a question JIRA, I'll create 3 new JIRAs
related to NB:
1. Update website to current NB specs (Current is for pre .7)
2. Fix Theta-Normalization problem
3. Address the error reported on the dev list last week by Chandler
Burgess re: testnb failing in sequential mode
> Why is theta normalization for naive bayes classification commented out?
> ------------------------------------------------------------------------
>
> Key: MAHOUT-1369
> URL: https://issues.apache.org/jira/browse/MAHOUT-1369
> Project: Mahout
> Issue Type: Question
> Components: Classification
> Affects Versions: 0.7, 0.8, 0.9
> Environment: mahout 0.8
> Reporter: utku yaman
> Priority: Minor
> Labels: features
> Fix For: 1.0
>
>
> TrainNaiveBayesJob line 155:158
> and
> BayesUtils line 86:93
> are commented out and these lines are for theta normalization for bayes.
> what is the problem with the code and is there a plan for correcting these
> methods.
--
This message was sent by Atlassian JIRA
(v6.2#6252)