Re: [Matterhorn-users] OCR Text Tuneup - more

Ruediger Rolf Fri, 20 May 2011 00:57:06 -0700

Hi Hank,

I know that the current strategy for the OCR is to generated theOCR-screen-shot at the end of the section, because the segment detectionignores upcomming additional text on the slide. So the screen-shot inthe player shows the beginning of the slide (in many of my examples onlythe first bullet point) but the text contained in the OCR is the end ofthe segment. Unfortunately the segment detection does not alwaysrecognize new slides all the time so that the text may be totallyunrelated to the segment-preview.


Rüdiger

Am 20.05.2011 02:04, schrieb Hank Magnuski:

The text segments also seem scrambled in order - the text listed doesnot match the slide shown on the left side.
Hank

On Thu, 19 May 2011, Hank Magnuski wrote:
I'm trying out the OCR-->text function and I'm getting about 25%recognizable words and 75% gibberish.
My dictionaries were registered into the database and I see thetables have about 20K entries.
Any hints on debugging this? I'm using the default workflow.
There are a lot of words missing, too. Does the 3rd party toolspackage produce reasonable quality text extraction?
_______________________________________________
Matterhorn-users mailing list
[email protected]
http://lists.opencastproject.org/mailman/listinfo/matterhorn-users



--

________________________________________________
Rüdiger Rolf, M.A.
Universität Osnabrück - Zentrum virtUOS
Heger-Tor-Wall 12, 49069 Osnabrück
Telefon: (0541) 969-6511 - Fax: (0541) 969-16511
E-Mail: [email protected]
Internet: www.virtuos.uni-osnabrueck.de

_______________________________________________
Matterhorn-users mailing list
[email protected]
http://lists.opencastproject.org/mailman/listinfo/matterhorn-users

Re: [Matterhorn-users] OCR Text Tuneup - more

Reply via email to