Hi Hank,

I know that the current strategy for the OCR is to generated the OCR-screen-shot at the end of the section, because the segment detection ignores upcomming additional text on the slide. So the screen-shot in the player shows the beginning of the slide (in many of my examples only the first bullet point) but the text contained in the OCR is the end of the segment. Unfortunately the segment detection does not always recognize new slides all the time so that the text may be totally unrelated to the segment-preview.

Rüdiger

Am 20.05.2011 02:04, schrieb Hank Magnuski:

The text segments also seem scrambled in order - the text listed does not match the slide shown on the left side.

Hank

On Thu, 19 May 2011, Hank Magnuski wrote:


I'm trying out the OCR-->text function and I'm getting about 25% recognizable words and 75% gibberish.

My dictionaries were registered into the database and I see the tables have about 20K entries.

Any hints on debugging this? I'm using the default workflow.

There are a lot of words missing, too. Does the 3rd party tools package produce reasonable quality text extraction?
_______________________________________________
Matterhorn-users mailing list
[email protected]
http://lists.opencastproject.org/mailman/listinfo/matterhorn-users



--

________________________________________________
Rüdiger Rolf, M.A.
Universität Osnabrück - Zentrum virtUOS
Heger-Tor-Wall 12, 49069 Osnabrück
Telefon: (0541) 969-6511 - Fax: (0541) 969-16511
E-Mail: [email protected]
Internet: www.virtuos.uni-osnabrueck.de

_______________________________________________
Matterhorn-users mailing list
[email protected]
http://lists.opencastproject.org/mailman/listinfo/matterhorn-users

Reply via email to