Re: [Matterhorn-users] Error extracting text

Andreas . Krieger Tue, 21 Feb 2012 01:25:08 -0800

Hi,

I keep getting gray tif images with text extraction, with maybe 1 in20 being an "OK" picture.


In the logs I see this (excerpt for one of the pics):

2012-02-21 00:09:48  INFO (ComposerServiceImpl:487) - creating an image using 
video track 404c1358-aa99-4938-aa31-8a52066d74cf
2012-02-21 00:09:48  INFO (AbstractCmdlineEncoderEngine:234) - Executing 
encoding command: /usr/local/bin/ffmpeg -strict inofficial -y -ss 4737 -i 
/opt/matterhorn/opencast/workspace/mediapackage/Unscheduled-lecturetube-fhhs1-1300958299999/404c1358-aa99-4938-aa31-8a52066d74cf/Screen.mpg
 -r 1 -vframes 1 -f image2 -pix_fmt rgb24 
/opt/matterhorn/opencast/workspace/mediapackage/Unscheduled-lecturetube-fhhs1-1300958299999/404c1358-aa99-4938-aa31-8a52066d74cf/Screen_4737_7e6b17f8-f273-4825-a163-71c8ec6505bf.4737.jpeg
2012-02-21 00:09:49  INFO (FFmpegEncoderEngine:174) - [mpeg @ 
0xf3a8440]max_analyze_duration reached
2012-02-21 00:09:49  INFO (FFmpegEncoderEngine:174) - encoder         : 
Lavf52.64.2
2012-02-21 00:09:49  INFO (FFmpegEncoderEngine:174) - [mpeg1video @ 
0xf3a9870]warning: first frame is no keyframe
2012-02-21 00:09:49  INFO (AbstractCmdlineEncoderEngine:258) - Video track 
Screen.mpg successfully encoded using profile 'text-analysis.http'
2012-02-21 00:09:49  INFO (ComposerServiceImpl:864) - Deleted local copy of 
image file at 
/opt/matterhorn/opencast/workspace/mediapackage/Unscheduled-lecturetube-fhhs1-1300958299999/404c1358-aa99-4938-aa31-8a52066d74cf/Screen_4737_7e6b17f8-f273-4825-a163-71c8ec6505bf.4737.jpeg

2012-02-21 00:09:54  INFO (TextAnalyzerServiceImpl:158) - Converting 
http://matterhorntest.zserv.tuwien.ac.at/files/collection/composer/703672_0.jpeg
 to tif format
2012-02-21 00:10:00  INFO (ComposerServiceImpl:622) - Converting 
http://matterhorntest.zserv.tuwien.ac.at/files/collection/composer/703672_0.jpeg
2012-02-21 00:10:00  INFO (AbstractCmdlineEncoderEngine:234) - Executing 
encoding command: /usr/local/bin/ffmpeg -y -f image2 -i 
/opt/matterhorn/opencast/workspace/collection/composer/703672_0.jpeg -f image2 
/opt/matterhorn/opencast/workspace/collection/composer/703672_0_b8bfb926-ccf8-4584-947f-31dfc1683de0.tif

2012-02-21 00:10:00  INFO (FFmpegEncoderEngine:174) - [swscaler @ 0x1f7a5770]No 
accelerated colorspace conversion found from yuv420p to rgb24.
2012-02-21 00:10:00  INFO (FFmpegEncoderEngine:174) - encoder         : 
Lavf52.64.2
2012-02-21 00:10:00  INFO (AbstractCmdlineEncoderEngine:258) - Video track 
703672_0.jpeg successfully encoded using profile 'image-conversion.http'
2012-02-21 00:10:00  INFO (ComposerServiceImpl:864) - Deleted local copy of 
image file at 
/opt/matterhorn/opencast/workspace/collection/composer/703672_0_b8bfb926-ccf8-4584-947f-31dfc1683de0.tif

2012-02-21 00:10:01  INFO (TextAnalyzerServiceImpl:184) - Starting text 
extraction from 
http://matterhorntest.zserv.tuwien.ac.at/files/collection/composer/702564.tif
2012-02-21 00:10:02  INFO (TextAnalyzerServiceImpl:213) - Text extraction of 
http://matterhorntest.zserv.tuwien.ac.at/files/collection/composer/702564.tif 
finished, 0 lines found
2012-02-21 00:10:04  INFO (TextAnalyzerServiceImpl:225) - Finished text 
extraction of 
http://matterhorntest.zserv.tuwien.ac.at/files/collection/composer/702564.tif

My suspect for the greyness is the "swscaler: No acceleratedcolorspace conversion found from yuv420p to rgb24" - line.

Is this an indicator for us using ffmpeg 0.6 instead of a moreadvanced version?


We have MH 1.2 Rev. 11326 installed on CentOS.

Any input appreciated, Andreas


[email protected] schrieb am Sat, 18 Feb 2012 betreff "Re:...":

Hi Kristof,

allow me a short question: did you get further into investigating the reasonfor the many grey images with once in a while a correct slide?

We experience the same here, first time I noticed; do you know some easytrick to make this work? Did you go into that more deeply (so I don't have to;) ?


Regards, Andreas

Kristof Keppens schrieb am Tue, 22 Nov 2011 betreff "Re:[Matterhorn-users]...":

Hi,

We are getting further with the setup of our matterhorn infrastructure, andso far most things work and we are almost ready to launch the 1.2 version.However the problem with the text extraction is still there and I haven'tfound a solution so far. I did find the reason why the text extractionfails, the tif file generated for text extraction is most of the time ablank grey image, always the same file size and solid grey. Once in a whilethere is a correct tif file generated and the text extraction is fine then.

I don't see a clear connection between the successful tif files and thefailed ( it's a ratio of about 1/10 tif's are correct ) ones.


Is anyone else experiencing these problems and found a solution ?

Thanks

Kristof Keppens
Ghent University

On 2011-10-13 14:56, Kristof Keppens wrote:

Hi,

I'm having some issues with the text extraction with our fresh 1.2
installation.
I keep getting the following error:

2011-10-13 13:03:31 WARN (TextAnalyzerServiceImpl:229) - Error
extracting text from
http://ic**.ugent.be:8080/files/collection/composer/550.tif
java.lang.IllegalArgumentException: The text cannot be empty
at
org.opencastproject.metadata.mpeg7.TextualImpl.<init>(TextualImpl.java:81)
at
org.opencastproject.textanalyzer.impl.TextAnalyzerServiceImpl.analyze(TextAnalyzerServiceImpl.java:324)

at
org.opencastproject.textanalyzer.impl.TextAnalyzerServiceImpl.extract(TextAnalyzerServiceImpl.java:194)

at
org.opencastproject.textanalyzer.impl.TextAnalyzerServiceImpl.process(TextAnalyzerServiceImpl.java:253)

at
org.opencastproject.job.api.AbstractJobProducer$JobRunner.call(AbstractJobProducer.java:184)

at
org.opencastproject.job.api.AbstractJobProducer$JobRunner.call(AbstractJobProducer.java:156)

at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

at java.lang.Thread.run(Thread.java:662)

This error is repeated a number of times in the log. The text extraction
does not fail for every image, just for some images, but as a result the
recording
has the status failed with following error :

org.opencastproject.workflow.api.WorkflowOperationException:
org.opencastproject.workflow.api.WorkflowOperationException: Text
extraction failed on images from
http://ic**.ugent.be:8080/files/mediapackage/5952f751-e8f9-41e5-b55d-7002ca31a67b/8fd9ca3d-cfbc-429a-a035-2ddcbf608412/logica_trimmed.avi


These are tests with manually uploaded files, not sure if this could be
a factor why it fails?

Thanks

Kristof Keppens
_______________________________________________
Matterhorn-users mailing list
[email protected]
http://lists.opencastproject.org/mailman/listinfo/matterhorn-users


_______________________________________________
Matterhorn-users mailing list
[email protected]
http://lists.opencastproject.org/mailman/listinfo/matterhorn-users


-----------------------
[email protected]
01/58801 DW 41523
mobil: 0664/60 588 4523
TU Wien
DVR-Nummer: 0005886
-----------------------


-----------------------
[email protected]
01/58801 DW 41523
mobil: 0664/60 588 4523
TU Wien
DVR-Nummer: 0005886
-----------------------
_______________________________________________
Matterhorn-users mailing list
[email protected]
http://lists.opencastproject.org/mailman/listinfo/matterhorn-users

Re: [Matterhorn-users] Error extracting text

Reply via email to