That's a great call, thanks. I grepped my mail files and didn't find any SPAM_99 headers in any of them.

You should be looking for BAYES_99 and BAYES_999 in your corpus.


Thanks, Dave. I use my various mailboxes (sa-learn --ham --mbox /home/thomas.cameron/mail/INBOX/[mailbox file] and then sa-learn --spam --mbox /home/thomas.cameron/mail/INBOX/spam) to train SA, doesn't that mean that I've already checked my corpora?

No, that's how you train your corpora. If you manually look through the headers of mail that's already been processed by your mail system, the ham should be as close to BAYES_00 as possible, and spam should be at BAYES_99. If that's not the case, then it's been trained incorrectly.

/etc/mail/spamassassin/local.cf:
bayes_auto_learn  0
bayes_auto_expire 0

I'd also recommend disabling auto-learn, if you have that enabled.

If you've gone through your corpus manually, and are certain the ham is all good mail and the spam emails are all bad mail, then it might be worth it to dump the existing bayes database and just retrain it with the corresponding mboxes.

I also typically add --progress to sa-learn.

Best,
Dave




Thomas

Reply via email to