-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

forwarding on behalf of Alexander...

- ------- Forwarded Message

Date:    Wed, 09 Nov 2005 13:08:47 +0100
From:    "Alexander K. Seewald" <[EMAIL PROTECTED]>
To:      [EMAIL PROTECTED]
Subject: SA-Train

Hi Justin,

I've implemented a training procedure for SpamAssassin which learn
the rule scores as well as the bayes model. It can do a
cross-validation, and uses a linear kernel SVM for score learning,
which should perform better than the perceptron (a perceptron is
essentially a randomized version of a linear SVM that does not
guarantee the maximum margin hyperplane, but just one hyperplane in
case of linear separability and nothing at all if the data is not
linearly separable)

Papers describing the work, plus the scripts are available at
http://alex.seewald.at/spam. Please tell me if you find these useful,
and possibly set a link from spamassassin.org where appropriate.
I am also willing to contribute the code for SpamAssassin -
everything is written in Perl, so it should be easy to integrate.

On a less positive note, I have found - based on about one year of
experiments with similar systems, during which I built up a SA-based
filtering system at ÖFAI - that SA does not offer better performance
than pure bayes systems such as SpamBayes. It is still competitive,
and the resulting models (bayes+ruleset) are smaller and therefore
more efficient. These experiments have been undertaken on a local
corpus of around 100,000 from eight different users. We have a
spam/ham ratio of 20:1 (i.e. 95% of incoming mails is spam)

Best,
  Alex
- -- 
Dr.techn. Alexander K. Seewald

Solutions for the 21st century   +43(664)1106886
- ------------------------------------------------
         Information wants to be free;
Information also wants to be expensive (S.Brant)
- --------------- alex.seewald.at ----------------



- ------- End of Forwarded Message

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFDfV82MJF5cimLx9ARAnzZAJ49gDOgyM90JNFRTSYKkOnYMHyMcgCfe3WL
JVMTeLy27awRyEvxlsTRxwE=
=tQZm
-----END PGP SIGNATURE-----

Reply via email to