[ 
https://issues.apache.org/jira/browse/MAHOUT-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Palumbo updated MAHOUT-1504:
-----------------------------------

    Attachment: MAHOUT-1504.patch

This patch fixes the thetaSummer Job bug. With this CNB will run with weight 
normalization as per section 3.2 of the Rennie paper.  I decided to keep it 
simple, and just get the weight normalization working.  This allows for the 
full algorithm as outlined in Table 4 of Rennie.

Weight normalization is not needed for standard NB and the thetaSummer Job is 
just an added expense. Though the weight summations are all done, I've left the 
weight normalization step commented out in StandardNaiveBayesClassifier.

I am thinking maybe something like adding a -w option for weight normalization 
or only running the thetaSummer Job when the -c option is supplied might make 
sense (the former may unnecessarily complicate things).  Another (probably 
better) option would be to store the calculated weights in the model (during 
the training phase) so that they don't need to be recalculated when 
testing/classifying.  Probably questions for another JIRA.

Let me know if any changes are needed.

> Enable/fix thetaSummer job in TrainNaiveBayesJob
> ------------------------------------------------
>
>                 Key: MAHOUT-1504
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1504
>             Project: Mahout
>          Issue Type: Task
>          Components: Classification, Examples
>    Affects Versions: 0.9
>            Reporter: Andrew Palumbo
>            Priority: Minor
>             Fix For: 1.0
>
>         Attachments: MAHOUT-1504.patch
>
>
> A new implementation of Naive Bayes was introduced in .7.  The weight (theta) 
> normalization job was at least partially carried over but not fully 
> implemented or enabled.  Weight normalization does not effect simple NB or 
> CNB however enabling it will allow for all NB implementations in the Rennie 
> et al. paper. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to