Can someone give me a quick lesson on how to adjust the good/spam numbers in the probability test? I know HOW to adjust the numbers, but I'm fuzzy on what to adjust them to. I've been using a ratio of 1:2 for okay words and upwards to 1:9 for undesired words.
You shouldn't.
The whole idea behind Bayesian filtering is that you can predict the chances of an E-mail being spam based on data about prior spam. If you change the data to something that it wasn't, you will change the outcome of the Bayesian filtering.
Changing the numbers is no different than taking a mean profit margin of 12.2% and rounding it up to 13% for your boss. It may seem nicer, it may make your boss happier, it may take up less space on a report. But it won't be mathematically accurate. It won't be mathematically accurate even with correct data (Bayesian filtering is really just a guesstimate, based loosely on Bayes' Theorem), but there is mathematical logic that gets broken by playing with the numbers.
If you haven't trained the Bayesian filtering for the spam and legitimate E-mail for this specific user, you should do so. Note that Bayesian filtering works much, much better when it is trained for each specific user (otherwise, for example, a mortgage broker will lose a lot of legitimate E-mail, due to the preponderance of mortgage related spam that other users get).
-Scott
To Unsubscribe: http://www.ipswitch.com/support/mailing-lists.html List Archive: http://www.mail-archive.com/imail_forum%40list.ipswitch.com/ Knowledge Base/FAQ: http://www.ipswitch.com/support/IMail/
