Hi Chandler, I've been looking at the code and the Rennie paper a bit over the past couple of days- i haven't hat too much time with it, but have seen some of the problems. I may be wrong and please correct me if i am, but I want to say that for the binary classification problem, Multinomial Naive Bayes (MNB) and Complementary Naive Bayes (CNB) should be essentially the same. In the 2 class problem where this implementation of MNB is predicting based on the probability of a document belonging to its class, CNB is predicting based on the probability that it does NOT belong to the ONLY other class. Without the theta-normalization implemented i dont think that CNB and MNB will yield different classifications (for 2 class problems).
I tested this out on 20-Newsgroups and can see that CNB and MNB are giving different classification results for 20 but will give the same result for just 2 of the 20. I haven't figured out whats going on with the Theta-normalization yet but it seems to me that it should be implemented as a differnet algorithm (WCMB in the paper) or with an option to enable it within CNB. Andy > Date: Fri, 28 Mar 2014 14:29:54 -0700 > From: [email protected] > Subject: Re: MAHOUT-1369 - Why does theta normalization for naive bayes > classification commented out? > To: [email protected] > > .. and please create a JIRA for this, it definitely seems like an issue. > > Nevertheless its time to verify and validate this impl given that the > original author has not responded. > > > > On , Suneel Marthi <[email protected]> wrote: > > I was alluring to TrainNaiveBayesJob which is MR only. U r right > TestNaiveBayesDriver has both MR and sequential. > Looking at the code for MR v/s sequential in TestNaiveBayes they both seem to > be calling the respective Standard/Complimentary Naive Bayes > classifiers. > > I guess we need to look at CNB calculations more closely and see if its doing > the right thing. > > On Friday, March 28, 2014 5:09 PM, Chandler Burgess > <[email protected]> wrote: > > Ok, then I should remove it? There's about 2 dozen lines of code in > TestNaiveBayesDriver for running sequentially. > > > -----Original Message----- > From: Suneel Marthi [mailto:[email protected]] > Sent: Friday, March 28, 2014 3:51 PM > To: [email protected] > Subject: Re: MAHOUT-1369 - Why does theta normalization for naive bayes > classification commented out? > > Bayes doesn't have a non-mapreduce impl so -seq flag wouldn't work. > > Sent from my > iPhone > > > On Mar 28, > 2014, at 4:16 PM, Chandler Burgess <[email protected]> wrote: > > > > Well, maybe someone can correct me but this seems disappointing. I > > uncommented the code in NaiveBayesModel, BayesUtil and TrainNaiveBayesJob, > > added some trace statements in ComplementaryThetaMapper and > > ComplementaryNaiveBayesClassifier to verify they were being called, and > > then ran some tests using trainnb/testnb. There was not a single difference > > in the classifications when train/testcomplementary was specified vs > > standard naïve bayes. > > > > Also, running testnb with the -seq flag doesn't appear to work. > > > > -----Original Message----- > > From: Chandler Burgess [mailto:[email protected]] > > Sent: Thursday, March 27, 2014 5:17 PM > > To: [email protected] > > Subject: RE: MAHOUT-1369 - Why does theta normalization for naive bayes > > classification commented out? > > > > The program I wrote didn't use a model that was trained with Cbayes. After > > looking at the scorers in SNB and CNB, I figured they would give different > > results even on a model not trained with CNB. That could very well be > > ignorance on my part as to the math. > > > > However, I did some command line tests using -c on both training and > > testing and didn't see any difference in the testnb output. > > ________________________________________ > > From: Suneel Marthi <[email protected]> > > Sent: Thursday, March 27, 2014 5:12 PM > > To: [email protected] > > Cc: [email protected] > > Subject: Re: MAHOUT-1369 - Why does theta normalization for naive bayes > > classification commented out? > > > > Just checking , u r testing Cbayes on a model that's already been trained > > using Cbayes correct? > > > > Also the jira I > mentioned earlier was fixed for .9, so u should be > > good. No code changes were done to naive bayes since .9 > > > > > > Sent from my iPhone > > > >> On Mar 27, 2014, at 6:01 PM, Chandler Burgess <[email protected]> > >> wrote: > >> > >> Ok, I'll uncomment those lines and see. I also have plenty of test data > >> available too (I'm doing document classification with unbalanced > >> classes), so I'll see if it improves there as well. > >> > >> Also, I'll try to make some time in the next week and go over the > >> algorithm in detail compared with the paper as an extra check. > >> > >> Thanks, > >> Chandler > >> ________________________________________ > >> From: Sebastian Schelter <[email protected]> > >> Sent: Thursday, March 27, 2014 4:01 PM > >> To: [email protected] > >> Subject: Re: MAHOUT-1369 - Why does theta normalization for naive bayes > >> classification commented out? > >> > >> Hi Chandler, > >> > >> I think a good way to go would be to reenable theta normalization and > >> run the classification examples that we already have to see how it > >> affects the > result (and make sure it improves the result). > >> > >> Would be great to have this fixed. I'm also planning to port NB to > >> our Spark DSL very soon (should be just a few lines of code). > >> > >> > --sebastian > >> > >> > >>> On 03/27/2014 09:07 PM, Suneel Marthi wrote: > >>> Which Mahout version r u running? While its true that ThetaNormalizer is > >>> still disabled today, Mahout-1389 fixes a bug wherein Complementary NB > >>> wasn't being called when invoked. > >>> > >>> Please test with Mahout 0.9 or trunk. > >>> > >>> > >>> > >>> > >>> On Thursday, March 27, 2014 3:53 PM, Chandler Burgess > >>> <[email protected]> wrote: > >>> > >>> Hello all, > >>> > >>> It seems Robin Anil > hasn't responded, and no one is > sure of the status on this. What needs to be done on this, and/or what can I > do to help? I'm no ML expert, but I do have the paper and should be able to > verify/fix the implementation. I'm REALLY interested in using the CNB > classifier, since it seems well suited to the problem I'm trying to tackle, > before I give up and use something else. > >>> > >>> I've done tests and see no difference when -c is passed on the command > >>> line for training or testing. I also wrote a program to print the scores > >>> using StandardNaiveBayesClassifier and ComplementaryNaiveBayesClassifier > >>> in a binary classification problem and see no difference between the > >>> scores, so it seems complementary naïve bayes is completely disabled. > >>> > >>> Thanks, > >>> Chandler Burgess > >>
