On 22 Jan 2008 at 9:04, Craig Schmitt wrote:

> A better solution would be to modify rebuildspamdb.pl to look a the the 
> spam/notspam word counts in the corpus and impose some limits on the number 
> of words included in the spamdb output to bring the norm back into the 
> acceptable range.
> 
> Thoughts/comments?

I think it's possible to put too much emphasis on  the 
Bayesian norm.  I have a very low value, but have always 
got very accurate results - it just depends on your spam 
mix, and your examples show how different they can be.

If rebuildspamdb is altered to use wordcounts, I would 
still like the option of using the current message 
count/byte count method.

paul


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Assp-test mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/assp-test

Reply via email to