| Without being fed data for individual users, Bayesian filtering becomes
| less effective (how much less effective depends on how similar your users
| are; a small business will see better server-wide results than an ISP, for
| example), and that is most likely just a limitation you would need to

 [--jimm replies]

If I may chime in on this thread; I should like to get some advice.  Right
now I teach the statistical filter which words are good by extracting the
false positives and running those against the antispamseeder.

That's a problem -- it breaks Bayes' Theorem. By only feeding in the false positives, while it reinforces that the words in the false positive are "legitimate" words, you aren't feeding in the other legitimate E-mails (so the legitimate words in there become less legitimate to the filter).


How would one catch the true negatives as well? I have thought about copying all incoming
messages, but have thought that would also copy the stuff I intend to
quarantine.

That gets more difficult. Copying all messages doesn't work, as it will include false negatives (uncaught spam), which helps ensure that the spam won't get caught the next time either.


For Bayesian filtering to be properly trained, a human needs to sort through all the spam (both the spam that is caught, as well as the spam that is not caught) and all the legitimate E-mail (both the legitimate E-mail that gets through and any false positives). It can be a lot of work. Most people avoid the work, but that has the drawback of making the statistical algorithms less effective.

When I first started work on Bayesian anti-spam algorithms (back before any anti-spam program offered Bayesian filtering), I didn't think that it would be very useful, mostly due to the amount of work required on a per-user level, as well as the invalid assumptions that are made (Bayes' Theorem has certain requirements that Bayesian filtering doesn't meet -- which is why E-mail often has a "0.00001%" or "99.9999%" chance of being spam). They are more useful than I originally envisioned, but usually require work to keep the accuracy high.
-Scott



To Unsubscribe: http://www.ipswitch.com/support/mailing-lists.html List Archive: http://www.mail-archive.com/imail_forum%40list.ipswitch.com/ Knowledge Base/FAQ: http://www.ipswitch.com/support/IMail/

Reply via email to