Hi Hank,
I know that the current strategy for the OCR is to generated the
OCR-screen-shot at the end of the section, because the segment detection
ignores upcomming additional text on the slide. So the screen-shot in
the player shows the beginning of the slide (in many of my examples only
the first bullet point) but the text contained in the OCR is the end of
the segment. Unfortunately the segment detection does not always
recognize new slides all the time so that the text may be totally
unrelated to the segment-preview.
Rüdiger
Am 20.05.2011 02:04, schrieb Hank Magnuski:
The text segments also seem scrambled in order - the text listed does
not match the slide shown on the left side.
Hank
On Thu, 19 May 2011, Hank Magnuski wrote:
I'm trying out the OCR-->text function and I'm getting about 25%
recognizable words and 75% gibberish.
My dictionaries were registered into the database and I see the
tables have about 20K entries.
Any hints on debugging this? I'm using the default workflow.
There are a lot of words missing, too. Does the 3rd party tools
package produce reasonable quality text extraction?
_______________________________________________
Matterhorn-users mailing list
[email protected]
http://lists.opencastproject.org/mailman/listinfo/matterhorn-users
--
________________________________________________
Rüdiger Rolf, M.A.
Universität Osnabrück - Zentrum virtUOS
Heger-Tor-Wall 12, 49069 Osnabrück
Telefon: (0541) 969-6511 - Fax: (0541) 969-16511
E-Mail: [email protected]
Internet: www.virtuos.uni-osnabrueck.de
_______________________________________________
Matterhorn-users mailing list
[email protected]
http://lists.opencastproject.org/mailman/listinfo/matterhorn-users