Continuing with my experimenting with a second bayesian filter - using spamprobe and controlling the tokens myself - and using SA to score the output.

So - I noticed that spam and ham often have different header fields. Some headers only show up in ham - and some headers only show up in spam. So I tokenized the headers themselves and fed just the header names in as data and got some really good results.

So - I don't know if SA is doing this but tokenizing the header names (excluding the common ones that all headers have) is very effective.

--
Marc Perkel - [EMAIL PROTECTED]

Spam Filter: http://www.junkemailfilter.com
   My Blog: http://marc.perkel.com
My Religion: http://www.churchofreality.org
~ "If it's real - we believe in it!" ~




Reply via email to