Martin Blapp wrote: > I already log possible text (I count alphanummeric chars in the ocr output)
I think it would be interesting to add a new text/plain part to the e-mail consisting of the OCR'd text, and feed that into Bayes. Even if OCR gets some words wrong, I bet the same mis-spelled tokens would quickly rise to the top of the "spammy" token list. We did some tests along these lines, and as a side-benefit, we discovered some SARE stock-scam tests firing on the OCR output. Regards, David. _______________________________________________ NOTE: If there is a disclaimer or other legal boilerplate in the above message, it is NULL AND VOID. You may ignore it. Visit http://www.mimedefang.org and http://www.roaringpenguin.com MIMEDefang mailing list [email protected] http://lists.roaringpenguin.com/mailman/listinfo/mimedefang

