Adam Lanier wrote: > I think it sounds like an interesting idea. Any idea how the bayes data > would work in highly technical environments (finance, medical, legal > etc)? Our biggest issue these days is spam that 'looks' like finance > related mail.
In our experience: Not too badly. Spam terms tend to be mis-spelled. Examples from our corpus: "mortgage" appears in 3235 spams and 399 hams. "m0rtgage" appears in 72 spams and 0 hams. "Mortgage" appears in 769 spams and 337 hams "mortggage", "m0rtggage" and "m0rttgage" are 100% reliable indicators of spam. (In fact, if you consider this message non-spam, it's the first one we've seen that is an exception. :-)) > How would we incorporate our own stream of bayes data into the RPTN data? Ah, well. By using CanIt-PRO. :-) It is possible to post-process the RPTN data to include your own tokens, but it would be a fair bit of work because we do not use SpamAssassin's Bayes implementation. (SA doesn't handle token pairs, and also stores hashes of tokens rather than tokens themselves.) > Sounds pretty cheap for ISP's or equivalent though if it increases the > effectiveness of their spam system. That's really the target market. Regards, David. _______________________________________________ Visit http://www.mimedefang.org and http://www.roaringpenguin.com MIMEDefang mailing list [email protected] http://lists.roaringpenguin.com/mailman/listinfo/mimedefang

