Re: [Matterhorn-users] Error extracting text

Kristof Keppens Tue, 22 Nov 2011 05:23:29 -0800

Hi,

We are getting further with the setup of our matterhorn infrastructure,and so far most things work and we are almost ready to launch the 1.2version. However the problem with the text extraction is still there andI haven't found a solution so far. I did find the reason why the textextraction fails, the tif file generated for text extraction is most ofthe time a blank grey image, always the same file size and solid grey.Once in a while there is a correct tif file generated and the textextraction is fine then.

I don't see a clear connection between the successful tif files and thefailed ( it's a ratio of about 1/10 tif's are correct ) ones.


Is anyone else experiencing these problems and found a solution ?

Thanks

Kristof Keppens
Ghent University

On 2011-10-13 14:56, Kristof Keppens wrote:

Hi,

I'm having some issues with the text extraction with our fresh 1.2
installation.
I keep getting the following error:

2011-10-13 13:03:31 WARN (TextAnalyzerServiceImpl:229) - Error
extracting text from
http://ic**.ugent.be:8080/files/collection/composer/550.tif
java.lang.IllegalArgumentException: The text cannot be empty
at
org.opencastproject.metadata.mpeg7.TextualImpl.<init>(TextualImpl.java:81)
at
org.opencastproject.textanalyzer.impl.TextAnalyzerServiceImpl.analyze(TextAnalyzerServiceImpl.java:324)

at
org.opencastproject.textanalyzer.impl.TextAnalyzerServiceImpl.extract(TextAnalyzerServiceImpl.java:194)

at
org.opencastproject.textanalyzer.impl.TextAnalyzerServiceImpl.process(TextAnalyzerServiceImpl.java:253)

at
org.opencastproject.job.api.AbstractJobProducer$JobRunner.call(AbstractJobProducer.java:184)

at
org.opencastproject.job.api.AbstractJobProducer$JobRunner.call(AbstractJobProducer.java:156)

at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

at java.lang.Thread.run(Thread.java:662)

This error is repeated a number of times in the log. The text extraction
does not fail for every image, just for some images, but as a result the
recording
has the status failed with following error :

org.opencastproject.workflow.api.WorkflowOperationException:
org.opencastproject.workflow.api.WorkflowOperationException: Text
extraction failed on images from
http://ic**.ugent.be:8080/files/mediapackage/5952f751-e8f9-41e5-b55d-7002ca31a67b/8fd9ca3d-cfbc-429a-a035-2ddcbf608412/logica_trimmed.avi


These are tests with manually uploaded files, not sure if this could be
a factor why it fails?

Thanks

Kristof Keppens
_______________________________________________
Matterhorn-users mailing list
[email protected]
http://lists.opencastproject.org/mailman/listinfo/matterhorn-users


_______________________________________________
Matterhorn-users mailing list
[email protected]
http://lists.opencastproject.org/mailman/listinfo/matterhorn-users

Re: [Matterhorn-users] Error extracting text

Reply via email to