RE: MAHOUT-1369 - Why does theta normalization for naive bayes classification commented out?

Andrew Palumbo Sat, 29 Mar 2014 17:34:25 -0700

Hi Chandler,
I've been looking at the code and the Rennie paper a bit over the past couple 
of days- i haven't hat too much time with it, but have seen some of the 
problems.  I may be wrong and please correct me if i am, but I want to say that 
for the binary classification problem, Multinomial Naive Bayes (MNB) and 
Complementary Naive Bayes (CNB) should be essentially the same. In the 2 class 
problem where this implementation of MNB is predicting based on the probability 
of a document belonging to its class, CNB is predicting based on the 
probability that it does NOT belong to the ONLY other class. Without the 
theta-normalization implemented i dont think that CNB and MNB will yield 
different classifications (for 2 class problems).


I tested this out on 20-Newsgroups and can see that CNB and MNB are giving 
different classification results for 20 but will give the same result for just 
2 of the 20.  

I haven't figured out whats going on with the Theta-normalization yet but it 
seems to me that it should be implemented as a differnet algorithm (WCMB in the 
paper) or with an option to enable it within CNB.

Andy  

> Date: Fri, 28 Mar 2014 14:29:54 -0700
> From: [email protected]
> Subject: Re: MAHOUT-1369 - Why does theta normalization for naive bayes 
> classification commented out?
> To: [email protected]
> 
> .. and please create a JIRA for this, it definitely seems like an issue.
> 
> Nevertheless its time to verify and validate this impl given that the 
> original author has not responded.
> 
> 
> 
> On , Suneel Marthi <[email protected]> wrote:
>  
> I was alluring to TrainNaiveBayesJob which is MR only.  U r right 
> TestNaiveBayesDriver has both MR and sequential. 
> Looking at the code for MR v/s sequential in TestNaiveBayes they both seem to 
> be calling the respective Standard/Complimentary Naive Bayes
>  classifiers.
> 
> I guess we need to look at CNB calculations more closely and see if its doing 
> the right thing.
> 
> On Friday, March 28, 2014 5:09 PM, Chandler Burgess 
> <[email protected]> wrote:
>  
> Ok, then I should remove it? There's about 2 dozen lines of code in 
> TestNaiveBayesDriver for running sequentially.
> 
> 
> -----Original Message-----
> From: Suneel Marthi [mailto:[email protected]] 
> Sent: Friday, March 28, 2014 3:51 PM
> To: [email protected]
> Subject: Re: MAHOUT-1369 - Why does theta normalization for naive bayes 
> classification commented out?
> 
> Bayes doesn't have a non-mapreduce impl so -seq flag wouldn't  work. 
> 
> Sent from my
>  iPhone
> 
> > On Mar 28,
>  2014, at 4:16 PM, Chandler Burgess <[email protected]> wrote:
> > 
> > Well, maybe someone can correct me but this seems disappointing. I 
> > uncommented the code in NaiveBayesModel, BayesUtil and TrainNaiveBayesJob, 
> > added some trace statements in ComplementaryThetaMapper and 
> > ComplementaryNaiveBayesClassifier to verify they were being called, and 
> > then ran some tests using trainnb/testnb. There was not a single difference 
> > in the classifications when train/testcomplementary was specified vs 
> > standard naïve bayes.
> > 
> > Also, running testnb with the -seq flag doesn't appear to work.
> > 
> > -----Original Message-----
> > From: Chandler Burgess [mailto:[email protected]]
> > Sent: Thursday, March 27, 2014 5:17 PM
> > To: [email protected]
> > Subject: RE: MAHOUT-1369 - Why does theta normalization for naive bayes 
> > classification commented out?
> > 
> > The program I wrote didn't use a model that was trained with Cbayes. After 
> > looking at the scorers in SNB and CNB, I figured they would give different 
> > results even on a model not trained with CNB. That could very well be 
> > ignorance on my part as to the math. 
> > 
> > However, I did some command line tests using -c on both training and 
> > testing and didn't see any difference in the testnb output.
> > ________________________________________
> > From: Suneel Marthi <[email protected]>
> > Sent: Thursday, March 27, 2014 5:12 PM
> > To: [email protected]
> > Cc: [email protected]
> > Subject: Re: MAHOUT-1369 - Why does theta normalization for naive bayes 
> > classification commented out?
> > 
> > Just checking , u r testing Cbayes on a model that's already been trained 
> > using Cbayes correct?
> > 
> > Also the jira I
>  mentioned earlier was fixed for .9, so u should be 
> > good. No code changes were done to naive bayes since .9
> > 
> > 
> > Sent from my iPhone
> > 
> >> On Mar 27, 2014, at 6:01 PM, Chandler Burgess <[email protected]> 
> >> wrote:
> >> 
> >> Ok, I'll uncomment those lines and see. I also have plenty of test data 
> >> available  too (I'm doing document classification with unbalanced 
> >> classes), so I'll see if it improves there as well.
> >> 
> >> Also, I'll try to make some time in the next week and go over the 
> >> algorithm in detail compared with the paper as an extra check.
> >> 
> >> Thanks,
> >> Chandler
> >> ________________________________________
> >> From: Sebastian Schelter <[email protected]>
> >> Sent: Thursday, March 27, 2014 4:01 PM
> >> To: [email protected]
> >> Subject: Re: MAHOUT-1369 - Why does theta normalization for naive bayes 
> >> classification commented out?
> >> 
> >> Hi Chandler,
> >> 
> >> I think a good way to go would be to reenable theta normalization and 
> >> run the classification examples that we already have to see how it 
> >> affects the
>  result (and make sure it improves the result).
> >> 
> >> Would be great to have this fixed. I'm also planning to port NB to 
> >> our Spark DSL very soon (should be just a few lines of code).
> >> 
> >>
>  --sebastian
> >> 
> >> 
> >>> On 03/27/2014 09:07 PM, Suneel Marthi wrote:
> >>> Which Mahout version r u running? While its true that ThetaNormalizer is 
> >>> still disabled today, Mahout-1389 fixes a bug wherein Complementary NB 
> >>> wasn't being called when invoked.
> >>> 
> >>> Please test with Mahout 0.9 or trunk.
> >>> 
> >>> 
> >>> 
> >>> 
> >>> On Thursday, March 27, 2014 3:53 PM, Chandler Burgess 
> >>> <[email protected]> wrote:
> >>> 
> >>> Hello all,
> >>> 
> >>> It seems Robin Anil
>  hasn't responded, and no one is
>  sure of the status on this. What needs to be done on this, and/or what can I 
> do to help? I'm no ML expert, but I do have the paper and should be able to 
> verify/fix the implementation. I'm REALLY interested in using the CNB 
> classifier, since it seems well suited to the problem I'm trying to tackle, 
> before I give up and use something else.
> >>> 
> >>> I've done tests and see no difference when -c is passed on the command 
> >>> line for training or testing. I also wrote a program to print the scores 
> >>> using StandardNaiveBayesClassifier and ComplementaryNaiveBayesClassifier 
> >>> in a binary classification problem and see no difference between the 
> >>> scores, so it seems complementary naïve bayes is completely disabled.
> >>> 
> >>> Thanks,
> >>> Chandler Burgess
> >>

RE: MAHOUT-1369 - Why does theta normalization for naive bayes classification commented out?

Reply via email to