<[EMAIL PROTECTED]> wrote in message 
news:[EMAIL PROTECTED]
> Hello Eric,
>
> I may be trying to do "premature optimization" that is the
> root of all evil as we know, but I found that if I train
> Bayesian filter with as few as 50 spams and 50 notspams this
> already gives pretty good results and with careful selection
> of spam probability one can throw away a good portion of
> spam already and not have to look through it manually.
>
> That script is just designed to make learning curve of
> filter steeper really.

Cool.  I took a different approach, which may or not be of use to you.  I 
actually had a good cross-section of spam/junk mail that I hadn't deleted 
over the last many months (some 30K+), so I extracted 20K of those, along 
with 20K valid emails accumulated in my own inbox over the years and dumped 
them all in the according spam/nospam filters.  I ran the rebuildspamdb.pl 
script and presto, instant spamdb.

Maybe a more idea situation would be to have several people from within your 
organization all extract a subsection of their historical emails, or if you 
have access to them on a central server, grab them from there.  I wouldn't 
expect there to be privacy issues as you aren't reading them - you are just 
running them through a spam analyzer.  As for finding a good source of spam, 
I don't think that is a particularly big issue, or we wouldn't here 
discussing it already. :)

In my case, it seems to have worked fairly well in filtering out spam and 
notspam right now.  There is still some manual sorting to do, as well as 
figuring out which emails belong on redlists and/or NP lists, but am slowly 
getting there.

Thanks,

Eric




-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Assp-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/assp-user

Reply via email to