-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
forwarding on behalf of Alexander... - ------- Forwarded Message Date: Wed, 09 Nov 2005 13:08:47 +0100 From: "Alexander K. Seewald" <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Subject: SA-Train Hi Justin, I've implemented a training procedure for SpamAssassin which learn the rule scores as well as the bayes model. It can do a cross-validation, and uses a linear kernel SVM for score learning, which should perform better than the perceptron (a perceptron is essentially a randomized version of a linear SVM that does not guarantee the maximum margin hyperplane, but just one hyperplane in case of linear separability and nothing at all if the data is not linearly separable) Papers describing the work, plus the scripts are available at http://alex.seewald.at/spam. Please tell me if you find these useful, and possibly set a link from spamassassin.org where appropriate. I am also willing to contribute the code for SpamAssassin - everything is written in Perl, so it should be easy to integrate. On a less positive note, I have found - based on about one year of experiments with similar systems, during which I built up a SA-based filtering system at ÖFAI - that SA does not offer better performance than pure bayes systems such as SpamBayes. It is still competitive, and the resulting models (bayes+ruleset) are smaller and therefore more efficient. These experiments have been undertaken on a local corpus of around 100,000 from eight different users. We have a spam/ham ratio of 20:1 (i.e. 95% of incoming mails is spam) Best, Alex - -- Dr.techn. Alexander K. Seewald Solutions for the 21st century +43(664)1106886 - ------------------------------------------------ Information wants to be free; Information also wants to be expensive (S.Brant) - --------------- alex.seewald.at ---------------- - ------- End of Forwarded Message -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) Comment: Exmh CVS iD8DBQFDfV82MJF5cimLx9ARAnzZAJ49gDOgyM90JNFRTSYKkOnYMHyMcgCfe3WL JVMTeLy27awRyEvxlsTRxwE= =tQZm -----END PGP SIGNATURE-----
