The author of the presentation I linked to earlier pointed me to this: http://wiki.apache.org/jakarta-lucene/SpellChecker
Which is implemented by: http://www.marine-geo.org/services/oai/docs/javadoc/org/apache/lucene/spell/ NGramSpeller.html -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Friday, January 25, 2008 7:31 AM To: java-user@lucene.apache.org Subject: Re: Lucene to index OCR text Thanks everyone for their ideas and suggestions! Some had occurred to us but were discarded because we feel our solution needs to be automated -- 45 million pages are a lot of thrust on any human-driven effort. I like Itamar's idea of doing "competing" OCR, and keeping the best result. Unfortunately OCR software is far from cheap, and the cost of 2 different product licenses may be too high for the project. I've also looked into the Tesseract/OCRopus, but while the ideas are good it ain't there yet. > On Jan 25, 2008 6:12 AM, mark harwood <[EMAIL PROTECTED]> wrote: > >> Probably not a practical solution for you to set up but I love this >> idea: >> http://blog.wired.com/monkeybites/2007/05/recaptcha_fight.html >> --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]