The author of the presentation I linked to earlier pointed me to this:

http://wiki.apache.org/jakarta-lucene/SpellChecker

Which is implemented by:

http://www.marine-geo.org/services/oai/docs/javadoc/org/apache/lucene/spell/
NGramSpeller.html


 

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Sent: Friday, January 25, 2008 7:31 AM
To: java-user@lucene.apache.org
Subject: Re: Lucene to index OCR text

Thanks everyone for their ideas and suggestions! Some had occurred to us but
were discarded because we feel our solution needs to be automated --
45 million pages are a lot of thrust on any human-driven effort.

I like Itamar's idea of doing "competing" OCR, and keeping the best result.
Unfortunately OCR software is far from cheap, and the cost of 2 different
product licenses may be too high for the project.

I've also looked into the Tesseract/OCRopus, but while the ideas are good it
ain't there yet.


> On Jan 25, 2008 6:12 AM, mark harwood <[EMAIL PROTECTED]> wrote:
>
>> Probably not a practical solution for you to set up but I love this
>> idea:
>>  http://blog.wired.com/monkeybites/2007/05/recaptcha_fight.html
>>



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to