Hi,
On a different note concerning images, what about an email filter logging the
possibility of the images containing hidden data (i.e. Steganography test).
I already log possible text (I count alphanummeric chars in the ocr output)
+header SPAMPIC_ALPHA_1 OCR-Output =~ /OCRTEXT: more than
alpha1 chars found/
+describe SPAMPIC_ALPHA_1 Image contains many alphanumeric chars
+score SPAMPIC_ALPHA_1 0.500
+
+header SPAMPIC_ALPHA_2 OCR-Output =~ /OCRTEXT: more than
alpha2 chars found/
+describe SPAMPIC_ALPHA_2 Image contains many alphanumeric chars
+score SPAMPIC_ALPHA_2 1.000
+
+header SPAMPIC_ALPHA_3 OCR-Output =~ /OCRTEXT: more than
alpha3 chars found/
+describe SPAMPIC_ALPHA_3 Image contains many alphanumeric chars
+score SPAMPIC_ALPHA_3 1.500
You could now do a statistic analytic to see if the chars match any language
specific char occurance to see if its really text.
Martin
_______________________________________________
NOTE: If there is a disclaimer or other legal boilerplate in the above
message, it is NULL AND VOID. You may ignore it.
Visit http://www.mimedefang.org and http://www.roaringpenguin.com
MIMEDefang mailing list [email protected]
http://lists.roaringpenguin.com/mailman/listinfo/mimedefang