1. A few months ago, several good messages were being tagged with the STATISTIC flag. Many three-letter words (such as 'and' 'the' 'nor' 'but') had a high 'spam' number (eg 0,57).

That's normal, actually. It just means that an E-mail with the word "and" has 57% chance of being spam (all other things being equal). In reality, when you think about it, that's probably about right: you would expect a word like "and" would be just as likely to appear in spam as it would to appear in legitimate E-mail. In this case, it's just leaning slightly towards the spam side.


2. My example shows new words with 0.45 when these are fairly common: 'bluecross' 'blueshield'.

That is by design -- Bayesian filters will assign a probability to keywords when they first see them. Over time, they become more accurate. Apparently, 45% (slightly more likely to be legitimate) is the default.


What do you mean by 'for this specific user'? This server relays for two domains: primary for one and secondary for other. The server simply tags the messages and the second server (IMail 7.07, mailboxes) sorts.

I believe that the Bayesian filtering in IMail v8's anti-spam can be "fed" data for specific users. So you could take all the spam the user has received and feed it to the Bayesian filter as spam, and take all the legitimate E-mail and feed it in as good mail. I believe there is an "antispamseeder" program used to do this.


Without being fed data for individual users, Bayesian filtering becomes less effective (how much less effective depends on how similar your users are; a small business will see better server-wide results than an ISP, for example), and that is most likely just a limitation you would need to accept.
-Scott



To Unsubscribe: http://www.ipswitch.com/support/mailing-lists.html List Archive: http://www.mail-archive.com/imail_forum%40list.ipswitch.com/ Knowledge Base/FAQ: http://www.ipswitch.com/support/IMail/

Reply via email to