On 22 Jan 2008 at 9:04, Craig Schmitt wrote: > A better solution would be to modify rebuildspamdb.pl to look a the the > spam/notspam word counts in the corpus and impose some limits on the number > of words included in the spamdb output to bring the norm back into the > acceptable range. > > Thoughts/comments?
I think it's possible to put too much emphasis on the Bayesian norm. I have a very low value, but have always got very accurate results - it just depends on your spam mix, and your examples show how different they can be. If rebuildspamdb is altered to use wordcounts, I would still like the option of using the current message count/byte count method. paul ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Assp-test mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/assp-test
