Chris, I came across this link as well... I'm convinced this is a far more effective spam blocker than any blacklist/checksum/group spam blocker. Looks very very promising.
I went ahead and put together a bunch of the code for this... I thought about how you would best want to build the corpus and for my money, I decided I would create the corpus based on IMAP folders. I'm working on an ant task that could on a daily or weekly basis trove a set of IMAP folders to build the good and bad corpus. Anyway, but I wrote code to tokenize MimeMessages, the code that compares the good and bad corpus and builds the probability token set, the Bayesian calculator to combine the probabilities of the 15 most interesting words, and some other related utilities. It's still a ways from being anything useful, and it would be really great once James has solid IMAP support. The hard part about this approach though is you need a decent sized corpus to make it really usable. I think it's pretty clear you could have a matcher use the probability set to either mark the message as spam or not... but again building that corpus is the hardest. Serge Knystautas Loki Technologies http://www.lokitech.com/ ----- Original Message ----- From: "Chris Means" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Friday, August 16, 2002 2:48 PM Subject: Anti-SPAM mailet > Would anyone be interested in developing a maillet (or whatever) to > implement some of the anti-spam techniques described in this article > mentioned on /.? > > http://www.paulgraham.com/spam.html > > I'd rather it put a flag in the email so I could filter it in my email > client, but it would be nice to have the option of automatically forwarding > it to SPAMCop etc. for reporting purposes. > > Any thoughts? > > Thanks. > > -Chris > -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
