<[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > Hello Eric, > > I may be trying to do "premature optimization" that is the > root of all evil as we know, but I found that if I train > Bayesian filter with as few as 50 spams and 50 notspams this > already gives pretty good results and with careful selection > of spam probability one can throw away a good portion of > spam already and not have to look through it manually. > > That script is just designed to make learning curve of > filter steeper really.
Cool. I took a different approach, which may or not be of use to you. I actually had a good cross-section of spam/junk mail that I hadn't deleted over the last many months (some 30K+), so I extracted 20K of those, along with 20K valid emails accumulated in my own inbox over the years and dumped them all in the according spam/nospam filters. I ran the rebuildspamdb.pl script and presto, instant spamdb. Maybe a more idea situation would be to have several people from within your organization all extract a subsection of their historical emails, or if you have access to them on a central server, grab them from there. I wouldn't expect there to be privacy issues as you aren't reading them - you are just running them through a spam analyzer. As for finding a good source of spam, I don't think that is a particularly big issue, or we wouldn't here discussing it already. :) In my case, it seems to have worked fairly well in filtering out spam and notspam right now. There is still some manual sorting to do, as well as figuring out which emails belong on redlists and/or NP lists, but am slowly getting there. Thanks, Eric ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Assp-user mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/assp-user
