Could be due to the way normalization is done. How is CNB performing? Do
share the confusion matrices and per label precision.

On Mon, Oct 10, 2011 at 11:20 PM, Grant Ingersoll <[email protected]>wrote:

> I was trying the Naive Bayes classifier via the build-asf-email.sh file I
> committed the other day on a data set that had a fairly significant
> variation in the number of messages per training label and am noticing
> (still need to validate more) that the label with the least number of
> examples is often dominating the results.  This seems counterintuitive to
> me.  I would have expected the largest set would have dominated the results.
>  If I even out the number of items per label, than I get reasonable results.
>  Any thoughts on what I am seeing?  If you are interested, I can share the
> details of the runs.
>
> -Grant
>

Reply via email to