Re: [CODE4LIB] web-based ocr

Michael Beccaria Wed, 13 Mar 2013 12:25:46 -0700

Tesseract has really poor quality last time I tried it and ABBYY server is 
ridiculously expensive (and charges perpage). Leadtools has an ocr sdk but it 
too is expensive. If you want to go relatively cheap on this (and I don't know 
for sure but probably break some licensing agreement with ABBYY) you could set 
up a web server with a $99 version of abbyy finereader with a hotfolder set up 
to convert anything that is dropped into it to txt. You would then have to 
write the backend to keep track of the files that were submitted, let abbyy 
convert it, and then show the results to the end user.


Mike Beccaria
Systems Librarian
Head of Digital Initiative
Paul Smith's College
518.327.6376
[email protected]
Become a friend of Paul Smith's Library on Facebook today!

-----Original Message-----
From: Code for Libraries [mailto:[email protected]] On Behalf Of Eric 
Lease Morgan
Sent: Tuesday, March 12, 2013 2:16 PM
To: [email protected]
Subject: Re: [CODE4LIB] web-based ocr

Thank you for the prompt replies. 

Call me cheap or unable to navigate the political/fiscal landscape, but I don't 
see myself subscribing to a service. Instead I see putting a wrapper around 
Tesseract, but alas, the wrappers are written in languages that I don't know. 
[1] Hmmm... On the Perl side, I am having problems installing 
Image::OCR::Tesseract. 

[1] Wrappers - http://code.google.com/p/tesseract-ocr/wiki/AddOns

--
Eric "Still Cogitating" Morgan

Re: [CODE4LIB] web-based ocr

Reply via email to