We had this same problem. I found a file in /opt/matterhorn/felix/conf/services that seems to serve as a pointer to the text extraction utility that causes the error. The file is org.opencastproject.textextractor.tesseract.TesseractTextExtractor.properties. I got rid of the # symbol and restarted my matterhorn services and we were in business. Hope this helps.
On Tue, Nov 22, 2011 at 6:29 AM, Kristof Keppens <[email protected]> wrote: > Hi, > > We are getting further with the setup of our matterhorn infrastructure, and > so far most things work and we are almost ready to launch the 1.2 version. > However the problem with the text extraction is still there and I haven't > found a solution so far. I did find the reason why the text extraction > fails, the tif file generated for text extraction is most of the time a > blank grey image, always the same file size and solid grey. Once in a while > there is a correct tif file generated and the text extraction is fine then. > > I don't see a clear connection between the successful tif files and the > failed ( it's a ratio of about 1/10 tif's are correct ) ones. > > Is anyone else experiencing these problems and found a solution ? > > Thanks > > Kristof Keppens > Ghent University > > On 2011-10-13 14:56, Kristof Keppens wrote: >> >> Hi, >> >> I'm having some issues with the text extraction with our fresh 1.2 >> installation. >> I keep getting the following error: >> >> 2011-10-13 13:03:31 WARN (TextAnalyzerServiceImpl:229) - Error >> extracting text from >> http://ic**.ugent.be:8080/files/collection/composer/550.tif >> java.lang.IllegalArgumentException: The text cannot be empty >> at >> org.opencastproject.metadata.mpeg7.TextualImpl.<init>(TextualImpl.java:81) >> at >> >> org.opencastproject.textanalyzer.impl.TextAnalyzerServiceImpl.analyze(TextAnalyzerServiceImpl.java:324) >> >> at >> >> org.opencastproject.textanalyzer.impl.TextAnalyzerServiceImpl.extract(TextAnalyzerServiceImpl.java:194) >> >> at >> >> org.opencastproject.textanalyzer.impl.TextAnalyzerServiceImpl.process(TextAnalyzerServiceImpl.java:253) >> >> at >> >> org.opencastproject.job.api.AbstractJobProducer$JobRunner.call(AbstractJobProducer.java:184) >> >> at >> >> org.opencastproject.job.api.AbstractJobProducer$JobRunner.call(AbstractJobProducer.java:156) >> >> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) >> at java.util.concurrent.FutureTask.run(FutureTask.java:138) >> at >> >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >> >> at >> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >> >> at java.lang.Thread.run(Thread.java:662) >> >> This error is repeated a number of times in the log. The text extraction >> does not fail for every image, just for some images, but as a result the >> recording >> has the status failed with following error : >> >> org.opencastproject.workflow.api.WorkflowOperationException: >> org.opencastproject.workflow.api.WorkflowOperationException: Text >> extraction failed on images from >> >> http://ic**.ugent.be:8080/files/mediapackage/5952f751-e8f9-41e5-b55d-7002ca31a67b/8fd9ca3d-cfbc-429a-a035-2ddcbf608412/logica_trimmed.avi >> >> >> These are tests with manually uploaded files, not sure if this could be >> a factor why it fails? >> >> Thanks >> >> Kristof Keppens >> _______________________________________________ >> Matterhorn-users mailing list >> [email protected] >> http://lists.opencastproject.org/mailman/listinfo/matterhorn-users > > _______________________________________________ > Matterhorn-users mailing list > [email protected] > http://lists.opencastproject.org/mailman/listinfo/matterhorn-users > -- Jack Vant System Engineer - Unix Office of Information Technology Boise State University 208-426-4443 208-863-0031 _______________________________________________ Matterhorn-users mailing list [email protected] http://lists.opencastproject.org/mailman/listinfo/matterhorn-users
