On Friday 03 November 2006 10:55, Carl Cerecke wrote:
> On 02/11/06, Steve Holdoway <[EMAIL PROTECTED]> wrote:
> > Hey,
> >
> > If you're interested, bounce your ideas off us! If we can work out a new
> > way of fuzzy matching on these images, then you'll be helping the OSS
> > cause big time. I haven't manipulated images in this way in the best part
> > of 20 years, so I'm a bit rusty!
>
> Text recognition in images is hard. The spammers (the ones I've seen)
> fuzz up the background a bit and have different coloured text to make
> it even more difficult. There is a good reason why text-in-images
> makes a good CAPTCHA (see wikipedia).
>
> I would suggest some image transforms first to get the shape of the
> letters easier to detect. Maybe convert to grayscale then increase
> contrast. You're probabaly better off splitting up groups of letters
> into words and then matching words rather than letters.
>
> There's bound to be some research on this. Maybe ACM digital library
> or something.

I heard today that there is a Spam Assassin plugin for checking for 
text-in-images, and how it had to be adapted to check all frames in an 
animated GIF, not just the first one.

So I suggest poking around the Spam Assassin website.

> Cheers,
> Carl.

Later
Lee Begg

Attachment: pgpOKHzrSd9cl.pgp
Description: PGP signature

Reply via email to