Henrik Krohns wrote:
On Sat, Oct 28, 2006 at 09:20:55AM -0700, Dennis Peterson wrote:
I've explored OCR on both color and de-colorized images and there have
been successes, but not enough to warrant turning it on in production. It
is very cpu intensive.

I don't get it.. unless you have some big honeypot, maybe 5% of traffic
contain small images to be OCRd. If your server can't handle that, I guess
it's running out of juice anyway. :)

You can even easily create separate scanning queue for OCR, so it doesn't
interfere with normal traffic.

You may have missed that I'm in the image industry - a great deal of what we do is imagery including imagery with text in it, and as we have to scan all images over a particular size, it would require more cpu than is worth it. And when you consider repeating it all at a disaster recovery site it's starting to be a lot of computer power with a high false positive probability.

You cannot count on the image spam being gif as png images are showing up now as are jpg, and animated gifs are also out there. OCR isn't practical for me but may be for others for a while - at least until they start to use CAPTCHA technology to get around it.

dp
_______________________________________________
http://lurker.clamav.net/list/clamav-users.html

Reply via email to