On Friday 03 November 2006 10:55, Carl Cerecke wrote: > On 02/11/06, Steve Holdoway <[EMAIL PROTECTED]> wrote: > > Hey, > > > > If you're interested, bounce your ideas off us! If we can work out a new > > way of fuzzy matching on these images, then you'll be helping the OSS > > cause big time. I haven't manipulated images in this way in the best part > > of 20 years, so I'm a bit rusty! > > Text recognition in images is hard. The spammers (the ones I've seen) > fuzz up the background a bit and have different coloured text to make > it even more difficult. There is a good reason why text-in-images > makes a good CAPTCHA (see wikipedia). > > I would suggest some image transforms first to get the shape of the > letters easier to detect. Maybe convert to grayscale then increase > contrast. You're probabaly better off splitting up groups of letters > into words and then matching words rather than letters. > > There's bound to be some research on this. Maybe ACM digital library > or something.
I heard today that there is a Spam Assassin plugin for checking for text-in-images, and how it had to be adapted to check all frames in an animated GIF, not just the first one. So I suggest poking around the Spam Assassin website. > Cheers, > Carl. Later Lee Begg
pgpOKHzrSd9cl.pgp
Description: PGP signature
