Chris,

I came across this link as well... I'm convinced this is a far more
effective spam blocker than any blacklist/checksum/group spam blocker.
Looks very very promising.

I went ahead and put together a bunch of the code for this... I thought
about how you would best want to build the corpus and for my money, I
decided I would create the corpus based on IMAP folders.  I'm working on an
ant task that could on a daily or weekly basis trove a set of IMAP folders
to build the good and bad corpus.

Anyway, but I wrote code to tokenize MimeMessages, the code that compares
the good and bad corpus and builds the probability token set, the Bayesian
calculator to combine the probabilities of the 15 most interesting words,
and some other related utilities.  It's still a ways from being anything
useful, and it would be really great once James has solid IMAP support.

The hard part about this approach though is you need a decent sized corpus
to make it really usable.  I think it's pretty clear you could have a
matcher use the probability set to either mark the message as spam or not...
but again building that corpus is the hardest.

Serge Knystautas
Loki Technologies
http://www.lokitech.com/
----- Original Message -----
From: "Chris Means" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Friday, August 16, 2002 2:48 PM
Subject: Anti-SPAM mailet


> Would anyone be interested in developing a maillet (or whatever) to
> implement some of the anti-spam techniques described in this article
> mentioned on /.?
>
> http://www.paulgraham.com/spam.html
>
> I'd rather it put a flag in the email so I could filter it in my email
> client, but it would be nice to have the option of automatically
forwarding
> it to SPAMCop etc. for reporting purposes.
>
> Any thoughts?
>
> Thanks.
>
> -Chris
>



--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Reply via email to