Re: [Opencast Matterhorn] How to improve OCR performance

Karen Dolan Tue, 03 Apr 2012 11:15:01 -0700

Miguel,

Also, my text extraction was extremely poor until the TESSDATA_PREFIXenvironment variable was set to point to the tessdata folder! Untilthen, it couldn't even find the dictionary.


*http://www.mail-archive.com/[email protected]/msg01852.html*

Karen

On 4/3/2012 1:51 PM, Karen Dolan wrote:

Miguel,

Matterhorn trunk (from a couple weeks ago) was configured to pull downLeptonica 1.66 and Tesseract 3.00.I went and retrieved Leptonica 1.67 and Tesseract 3.01 directly, alongwith the latest Tesseract English dictionary (Reference:http://code.google.com/p/tesseract-ocr/wiki/ReadMe).


The text extraction is now much better than it was a few months ago.

Good luck!
Karen



On 4/3/2012 11:29 AM, Miguel Del Agua wrote:

Hi,

I just installed version 1.3 and seems to work correctly, but the OCR
performance is quite poor. I've tried to install a new dictionary as
it's said in the wiki but the performance still bad. So I would like
to know if it's possible to improve text recognition either by
changing some parameters of OCRopus or improving in some way the
dictionary.

Thanks in advance.
_______________________________________________
Matterhorn mailing list
[email protected]
http://lists.opencastproject.org/mailman/listinfo/matterhorn


To unsubscribe please email
[email protected]
_______________________________________________

_______________________________________________
Matterhorn mailing list
[email protected]
http://lists.opencastproject.org/mailman/listinfo/matterhorn


To unsubscribe please email
[email protected]
_______________________________________________

Re: [Opencast Matterhorn] How to improve OCR performance

Reply via email to