[
https://issues.apache.org/jira/browse/MAHOUT-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Palumbo updated MAHOUT-1504:
-----------------------------------
Attachment: MAHOUT-1504.patch
This patch fixes the thetaSummer Job bug. With this CNB will run with weight
normalization as per section 3.2 of the Rennie paper. I decided to keep it
simple, and just get the weight normalization working. This allows for the
full algorithm as outlined in Table 4 of Rennie.
Weight normalization is not needed for standard NB and the thetaSummer Job is
just an added expense. Though the weight summations are all done, I've left the
weight normalization step commented out in StandardNaiveBayesClassifier.
I am thinking maybe something like adding a -w option for weight normalization
or only running the thetaSummer Job when the -c option is supplied might make
sense (the former may unnecessarily complicate things). Another (probably
better) option would be to store the calculated weights in the model (during
the training phase) so that they don't need to be recalculated when
testing/classifying. Probably questions for another JIRA.
Let me know if any changes are needed.
> Enable/fix thetaSummer job in TrainNaiveBayesJob
> ------------------------------------------------
>
> Key: MAHOUT-1504
> URL: https://issues.apache.org/jira/browse/MAHOUT-1504
> Project: Mahout
> Issue Type: Task
> Components: Classification, Examples
> Affects Versions: 0.9
> Reporter: Andrew Palumbo
> Priority: Minor
> Fix For: 1.0
>
> Attachments: MAHOUT-1504.patch
>
>
> A new implementation of Naive Bayes was introduced in .7. The weight (theta)
> normalization job was at least partially carried over but not fully
> implemented or enabled. Weight normalization does not effect simple NB or
> CNB however enabling it will allow for all NB implementations in the Rennie
> et al. paper.
--
This message was sent by Atlassian JIRA
(v6.2#6252)