Hi Ruben, would you mind sharing some details around the bugs you found and the improvements you are about to suggest? Maybe attach a patch to an open ticket?
Thanks, Tobias On 12.04.2012, at 12:40, Rub駭 P駻ez <[email protected]> wrote: > Dear all, > > We are currently struggling with the text extraction, too, and we are seeing > that Matterhorn is a little anglo-centric and does not like words with > characters outside the [a-zA-Z_0-9] range. We are making some developments > (partially thanks to Karen's advice --thanks!) but some of these involve > changing some Java code and some design decisions which can be regarded as > bugs. We want to test this thoroughly and perhaps we'll submit them for the > 1.4 version, since this wouldn't be a new feature, but correcting something > that is already in. > > Best regards > > 2012/4/12 Miguel Del Agua <[email protected]> > Thank you very much, but in my case captures seems to be OK. Anyway > the problem was due to some third party tools versions, and also due > to a incorrect dictionary loading. More info: > http://opencast.3480289.n2.nabble.com/How-to-improve-OCR-performance-tp7433198p7458735.html > > Regards, > > Miguel > > > 2012/4/5 費納德費納德 <[email protected]>: > > Hello Miguel, > > > > Take a look at the captures the workflow get form the video. In my case I > > get a grey pattern captures in 90% of the cases, so the OCR was not able to > > recognize almost anything. I solve it installing again ffmpeg and all the > > dependent packages. Now the OCR works almost perfect. But I have some issue > > with the ffmepg version, because recordings longer than 5 min I get errors > > during the video and audio mux. (With version 1.2 I didn't get these errors, > > only with 1.3. Maybe I install something in a different way). > > > > So I am not sure if you have this problem with the OCR but it is possible. > > > > > > Regards, > > > > Fernando Hernández Esguevillas. > > > > PD.- Si tienes alguna duda sobre como instalar la versión más reciente de > > ffmpeg me lo comentas y te paso algún link. Aunque es fácil encontrar la > > información en google. Un saludo. > > > > El 4 de abril de 2012 00:15, Miguel Del Agua <[email protected]> > > escribió: > >> > >> Hi, > >> > >> I just installed version 1.3 and seems to work correctly, but the OCR > >> performance is quite poor. I've tried to install a new dictionary as > >> it's said in the wiki but the performance still bad. So I would like > >> to know if it's possible to improve text recognition either by > >> changing some parameters of OCRopus or improving in some way the > >> dictionary. > >> > >> Thanks in advance. > >> _______________________________________________ > >> Matterhorn-users mailing list > >> [email protected] > >> http://lists.opencastproject.org/mailman/listinfo/matterhorn-users > > > > > > > > _______________________________________________ > > Matterhorn-users mailing list > > [email protected] > > http://lists.opencastproject.org/mailman/listinfo/matterhorn-users > > > _______________________________________________ > Matterhorn-users mailing list > [email protected] > http://lists.opencastproject.org/mailman/listinfo/matterhorn-users > > _______________________________________________ > Matterhorn-users mailing list > [email protected] > http://lists.opencastproject.org/mailman/listinfo/matterhorn-users _______________________________________________ Matterhorn-users mailing list [email protected] http://lists.opencastproject.org/mailman/listinfo/matterhorn-users
