Score Generation for Apache SpamAssassin

Duncan Findlay Mon, 23 Apr 2007 14:19:17 -0700

Hi everybody,

As you may already know, Steven Birk and I have been working on our
4th year undergraduate project in Math and Engineering at Queen's
University.


The goal of our project was to examine the use of logistic regression
as a potential replacement for the Perceptron/GA currently used by the
SpamAssassin project.

It's now done, and it's available here:
http://people.apache.org/~duncf/FindlayBirkThesis.pdf

Basically, we've found a technique that shows promise as a possible
replacement, but requires some modifications in order to handle some
of the restrictions the SpamAssassin projects puts on scores.

I hope to try to make those modifications in the next month or so, but
I have no idea how well it will turn out, or how easy it will be.

The paper may be an interesting read for people not too familiar with
the way the scoring process works now, as it discusses many of the
issues that differentiate the scoring process from most other machine
learning problems. (Then again, it might just be boring.)

Enjoy!

-- 
Duncan Findlay

pgpUSBFsMSnZj.pgp
Description: PGP signature

Score Generation for Apache SpamAssassin

Reply via email to