On Wed, Apr 28, 2010 at 11:25 PM, Gurudev Devanla <betaco...@gmail.com>wrote:
> > This is my first post ever on any open source mailing list. So, please > excuse me if I am not following certain standards. > You are doing great. > I was walking through the code for Naive Bayes classifier and I notice that > in TestClassifier.java, at the point where the document wieghts are > calculated the probability of the class(label) is not taken into > consideration. My knowledge of document wt in Naive Bayes is : > > Pr(C|D ) = Pr(D|C) * P(C) , but in the implementation I have downloaded, > I > don't see Pr(C) being used in the calculation. > Actually, the real computation is pr(C and D) = pr(D | C) * pr(C) pr(C | D) = pr(C and D) / pr(D) = pr(D | C) * pr(C) / pr(D) With D fixed to a single document under consideration, we don't need to consider pr(D) because argmax_C pr(C | D) = argmax_C pr(C, D) You are correct, however, that pr(C) might well be considered. It is conventional assumed, however, that the probabilities of all classes are equal so that this term can be ignored. If you have information about the a priori prevalence of different categories, it would not be amiss to include this factor. This consideration is considered in equation (3) in the paper "Tackling the Poor Assumptions of Naive Bayes Text Classifiers" by Jason Rennie and others that Robin mentions where log pr(C) is written as b_c. Just after this, however, the authors say: "the class probabilities tend to be overpowered by the combination of word probabilities, so we use a uniform prior estimate for simplicity" This is equivalent to saying that pr(C) = 1 / m where m is the number of categories. If you have trouble getting the PDF that Robin mentioned (CiteseerX is like a yo-yo lately) you can get the slides for a talk by Jason on the same topic: http://people.csail.mit.edu/jrennie/talks/icml03.pdf