Dear all,

We are currently struggling with the text extraction, too, and we are
seeing that Matterhorn is a little anglo-centric and does not like words
with characters outside the [a-zA-Z_0-9] range. We are making some
developments (partially thanks to Karen's advice --thanks!) but some of
these involve changing some Java code and some design decisions which can
be regarded as bugs. We want to test this thoroughly and perhaps we'll
submit them for the 1.4 version, since this wouldn't be a new feature, but
correcting something that is already in.

Best regards

2012/4/12 Miguel Del Agua <[email protected]>

> Thank you very much, but in my case captures seems to be OK. Anyway
> the problem was due to some third party tools versions, and also due
> to a incorrect dictionary loading. More info:
>
> http://opencast.3480289.n2.nabble.com/How-to-improve-OCR-performance-tp7433198p7458735.html
>
> Regards,
>
> Miguel
>
>
> 2012/4/5 費納德費納德 <[email protected]>:
> > Hello Miguel,
> >
> > Take a look at the captures the workflow get form the video. In my case I
> > get a grey pattern captures in 90% of the cases, so the OCR was not able
> to
> > recognize almost anything. I solve it installing again ffmpeg and all the
> > dependent packages. Now the OCR works almost perfect. But I have some
> issue
> > with the ffmepg version, because recordings longer than 5 min I get
> errors
> > during the video and audio mux. (With version 1.2 I didn't get these
> errors,
> > only with 1.3. Maybe I install something in a different way).
> >
> > So I am not sure if you have this problem with the OCR but it is
> possible.
> >
> >
> > Regards,
> >
> > Fernando Hernández Esguevillas.
> >
> > PD.- Si tienes alguna duda sobre como instalar la versión más reciente de
> > ffmpeg me lo comentas y te paso algún link. Aunque es fácil encontrar la
> > información en google. Un saludo.
> >
> > El 4 de abril de 2012 00:15, Miguel Del Agua <[email protected]>
> > escribió:
> >>
> >> Hi,
> >>
> >> I just installed version 1.3 and seems to work correctly, but the OCR
> >> performance is quite poor. I've tried to install a new dictionary as
> >> it's said in the wiki but the performance still bad. So I would like
> >> to know if it's possible to improve text recognition either by
> >> changing some parameters of OCRopus or improving in some way the
> >> dictionary.
> >>
> >> Thanks in advance.
> >> _______________________________________________
> >> Matterhorn-users mailing list
> >> [email protected]
> >> http://lists.opencastproject.org/mailman/listinfo/matterhorn-users
> >
> >
> >
> > _______________________________________________
> > Matterhorn-users mailing list
> > [email protected]
> > http://lists.opencastproject.org/mailman/listinfo/matterhorn-users
> >
> _______________________________________________
> Matterhorn-users mailing list
> [email protected]
> http://lists.opencastproject.org/mailman/listinfo/matterhorn-users
>
_______________________________________________
Matterhorn-users mailing list
[email protected]
http://lists.opencastproject.org/mailman/listinfo/matterhorn-users

Reply via email to