Hi,

In regards to handwriting, you could always train an OCR library to do this and there are several OCR libraries that attempt to do this out-of-the-box (probably most notable is Evernote) ...but yeah, the results vary greatly depending on the style of writing. Most focus on just hand printed things like post-its.

And a quick thing I found out recently about Tesseract. It is pretty good if all you want is the text extracted. It does not do layout recognition very well, so output will look funky if there's layout oddities...like footnotes. But it really depends on what you have and what you're trying to do. For example, I did not have much success making EPUBS with Tesseract, but it worked great with our theses (which have manditory layout requirements). So another big bonus for using the Internet Archive (who, I think, use Abbyy).



b,chris.


Eric Lease Morgan wrote:

Thank you for the prompt replies.

Call me cheap or unable to navigate the political/fiscal landscape, but I don't see myself subscribing to a service. Instead I see putting a wrapper around Tesseract, but alas, the wrappers are written in languages that I don't know. [1] Hmmm… On the Perl side, I am having problems installing Image::OCR::Tesseract.

[1] Wrappers - http://code.google.com/p/tesseract-ocr/wiki/AddOns

--
Eric "Still Cogitating" Morgan

Reply via email to