Henrik Krohns wrote:
On Sat, Oct 28, 2006 at 09:20:55AM -0700, Dennis Peterson wrote:
I've explored OCR on both color and de-colorized images and there have
been successes, but not enough to warrant turning it on in production. It
is very cpu intensive.
I don't get it.. unless you have some big honeypot, maybe 5% of traffic
contain small images to be OCRd. If your server can't handle that, I guess
it's running out of juice anyway. :)
You can even easily create separate scanning queue for OCR, so it doesn't
interfere with normal traffic.
You may have missed that I'm in the image industry - a great deal of
what we do is imagery including imagery with text in it, and as we have
to scan all images over a particular size, it would require more cpu
than is worth it. And when you consider repeating it all at a disaster
recovery site it's starting to be a lot of computer power with a high
false positive probability.
You cannot count on the image spam being gif as png images are showing
up now as are jpg, and animated gifs are also out there. OCR isn't
practical for me but may be for others for a while - at least until they
start to use CAPTCHA technology to get around it.
dp
_______________________________________________
http://lurker.clamav.net/list/clamav-users.html