I was trying the Naive Bayes classifier via the build-asf-email.sh file I 
committed the other day on a data set that had a fairly significant variation 
in the number of messages per training label and am noticing (still need to 
validate more) that the label with the least number of examples is often 
dominating the results.  This seems counterintuitive to me.  I would have 
expected the largest set would have dominated the results.  If I even out the 
number of items per label, than I get reasonable results.  Any thoughts on what 
I am seeing?  If you are interested, I can share the details of the runs.

-Grant

Reply via email to