So - I noticed that spam and ham often have different header fields. Some headers only show up in ham - and some headers only show up in spam. So I tokenized the headers themselves and fed just the header names in as data and got some really good results.
So - I don't know if SA is doing this but tokenizing the header names (excluding the common ones that all headers have) is very effective.
-- Marc Perkel - [EMAIL PROTECTED]
Spam Filter: http://www.junkemailfilter.com My Blog: http://marc.perkel.com My Religion: http://www.churchofreality.org ~ "If it's real - we believe in it!" ~
