On the question of norm optimizing: It appears that adjusting MaxFiles or MaxBytes on the front end when messages are collected isn't the best place to perform spamdb normalization.
Adjusting MaxFiles to collect more messages in spam or notspam really has no guarantee of effectiveness. In my case I have about 6500 spam messages with 1.1M words and 2500 notspam messages with 1.6M words. This is probably due to the fact that the spam vocabulary is very limited compared to the usual ham vocabulary. So if I adjust MaxFiles to 6500 (the highcount of the two) that won't effectively increase spam words to bring the norm closer to 1 because the spam directory would already be at MaxFiles. If anything, it has the opposite effect by collecting even more notspam messages (and words.) A better solution would be to modify rebuildspamdb.pl to look a the the spam/notspam word counts in the corpus and impose some limits on the number of words included in the spamdb output to bring the norm back into the acceptable range. Thoughts/comments? Regards, Craig ----- Original Message ----- From: "Steve Thompson" <[EMAIL PROTECTED]> To: "'ASSP development mailing list'" <[email protected]> Sent: Tuesday, January 22, 2008 8:35 AM Subject: Re: [Assp-test] Do norm optimizing question > >> >> Ok, i changed that in the latest release. >> >> fritz >> > > What exactly was changed here? > > Is it supposed to increase the norm if it is low? > > > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Assp-test mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/assp-test ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Assp-test mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/assp-test
