Re: [Matterhorn-users] Error extracting text

Ruediger Rolf Thu, 01 Mar 2012 07:48:59 -0800

The grey images are a well known issue that can come up with MPEG2 videos.

It's an issue with ffmpeg. We have the choice of having a fast imageextraction for OCR or a reliable on that is incredible slow.


This is the current encoding profile for the images:

profile.text-analysis.http.ffmpeg.command = -strict unofficial -y -ss #{time} 
-i #{in.video.path} -r 1 -vframes 1 -f image2 -pix_fmt rgb24 
#{out.dir}/#{out.name}#{out.suffix}

If you replace it by this you will not see the grey images again (simplychanging the position of -ss #{time} in the command line):


profile.text-analysis.http.ffmpeg.command = -strict unofficial -y -i 
#{in.video.path} -ss #{time} -r 1 -vframes 1 -f image2 -pix_fmt rgb24 
#{out.dir}/#{out.name}#{out.suffix}


Good luck
Rüdiger

Am 01.03.2012 16:38, schrieb [email protected]:

Hi,
should I also open an issue in jira, if we do get segmentation (ie.many pictures), but the generated tifs are most of them grey, thusyielding no text-extraction?
As stated earlier in this thread, Kristof Keppens gets this issue too.

Regards, Andreas
Tobias Wunden schrieb am Thu, 1 Mar 2012 betreff "Re:[Matterhorn-users]...":
Frank,
could you 1) open an issue for this in jira and 2) make your testfiles, workflow and encoding settings available?
Tobias
On 27.02.2012, at 12:24, Frank Van Damme <[email protected]>wrote:
2012/2/24 Frank Van Damme <[email protected]>:
2012/2/24 Kristof Keppens <[email protected]>:
As far as text recognition goes, has anyone got it working withthe 1.2release? Text recognition itself works fine, it's the creation ofthe tifffiles for recognition that fail. Most of the times these tiffs aresolidgrey and so it's normal that no text is found. Could this besolved by a
different ffmpeg version?
I doubt it, since the version for 1.3 IS different from the one in
1.2. (0.6 versus 0.8.2).
I'm not too sure that the image generation is the actual problem
either - I'm seeing this in the log file:

10:48:10  INFO (VideoSegmenterServiceImpl:502) - Found new scene at 0 s
10:48:11  INFO (VideoSegmenterServiceImpl:329) - Segmentation of
file:/var/lib/matterhorn/opencast_storagedir/workspace/collection/composer/726.mov
yields 1 segments
10:48:11  INFO (VideoSegmenterServiceImpl:350) - Finished video
segmentation of
file:/var/lib/matterhorn/opencast_storagedir/workspace/collection/composer/726.mov
I'd think this means Matterhorn will only generate one image in the
first place, which is the one at the beginning (otoh: it has text in
it, which is not recognized).

P.S. it would be totally great if this could be fixed before or
shortly after the 1.3 release.

--
Frank Van Damme
No part of this copyright message may be reproduced, read or seen,
dead or alive or by any means, including but not limited to telepathy
without the benevolence of the author.
_______________________________________________
Matterhorn-users mailing list
[email protected]
http://lists.opencastproject.org/mailman/listinfo/matterhorn-users
_______________________________________________
Matterhorn-users mailing list
[email protected]
http://lists.opencastproject.org/mailman/listinfo/matterhorn-users
-----------------------
[email protected]
01/58801 DW 41523
mobil: 0664/60 588 4523
TU Wien
DVR-Nummer: 0005886
-----------------------
_______________________________________________
Matterhorn-users mailing list
[email protected]
http://lists.opencastproject.org/mailman/listinfo/matterhorn-users



--

________________________________________________
Rüdiger Rolf, M.A.
Universität Osnabrück - Zentrum virtUOS
Heger-Tor-Wall 12, 49069 Osnabrück
Telefon: (0541) 969-6511 - Fax: (0541) 969-16511
E-Mail: [email protected]
Internet: www.virtuos.uni-osnabrueck.de

_______________________________________________
Matterhorn-users mailing list
[email protected]
http://lists.opencastproject.org/mailman/listinfo/matterhorn-users

Re: [Matterhorn-users] Error extracting text

Reply via email to