Re: [CODE4LIB] web-based ocr

chris fitzpatrick Tue, 12 Mar 2013 12:04:57 -0700

Hi,

In regards to handwriting, you could always train an OCR library to dothis and there are several OCR libraries that attempt to do thisout-of-the-box (probably most notable is Evernote) ...but yeah, theresults vary greatly depending on the style of writing. Most focus onjust hand printed things like post-its.

And a quick thing I found out recently about Tesseract. It is prettygood if all you want is the text extracted. It does not do layoutrecognition very well, so output will look funky if there's layoutoddities...like footnotes. But it really depends on what you have andwhat you're trying to do. For example, I did not have much successmaking EPUBS with Tesseract, but it worked great with our theses (whichhave manditory layout requirements). So another big bonus for using theInternet Archive (who, I think, use Abbyy).




b,chris.


Eric Lease Morgan wrote:

Thank you for the prompt replies.
Call me cheap or unable to navigate the political/fiscal landscape,but I don't see myself subscribing to a service. Instead I see puttinga wrapper around Tesseract, but alas, the wrappers are written inlanguages that I don't know. [1] Hmmm… On the Perl side, I am havingproblems installing Image::OCR::Tesseract.
[1] Wrappers - http://code.google.com/p/tesseract-ocr/wiki/AddOns

--
Eric "Still Cogitating" Morgan

Re: [CODE4LIB] web-based ocr

Reply via email to