Re: Question on Bayes Classifier

Ted Dunning Thu, 29 Apr 2010 10:16:16 -0700

On Wed, Apr 28, 2010 at 11:25 PM, Gurudev Devanla <betaco...@gmail.com>wrote:


>
> This is my first post ever on any open source mailing list. So, please
> excuse me if I am not following certain standards.
>

You are doing great.


> I was walking through the code for Naive Bayes classifier and I notice that
> in TestClassifier.java, at the point where the document wieghts are
> calculated the probability of the class(label)  is not taken into
> consideration. My knowledge of document wt in Naive Bayes is :
>
> Pr(C|D )  =  Pr(D|C) * P(C) , but in the implementation I have downloaded,
> I
> don't see Pr(C) being used in the calculation.
>

Actually, the real computation is

   pr(C and D) = pr(D | C) * pr(C)
   pr(C | D) = pr(C and D) / pr(D) = pr(D | C) * pr(C) / pr(D)

With D fixed to a single document under consideration, we don't need to
consider pr(D) because

  argmax_C pr(C | D) = argmax_C pr(C, D)

You are correct, however, that pr(C) might well be considered.  It is
conventional assumed, however, that the probabilities of all classes are
equal so that this term can be ignored.  If you have information about the a
priori prevalence of different categories, it would not be amiss to include
this factor.

This consideration is considered in equation (3) in the paper "Tackling the
Poor Assumptions of Naive Bayes Text Classiﬁers" by Jason Rennie and others
that Robin mentions where log pr(C) is written as b_c.  Just after this,
however, the authors say:

"the class probabilities tend to be overpowered by the combination of word
probabilities, so we use a uniform prior estimate for simplicity"


This is equivalent to saying that pr(C) = 1 / m where m is the number of
categories.

If you have trouble getting the PDF that Robin mentioned (CiteseerX is like
a yo-yo lately) you can get the slides for a talk by Jason on the same
topic: http://people.csail.mit.edu/jrennie/talks/icml03.pdf

Re: Question on Bayes Classifier

Reply via email to