We had this same problem.  I found a file in
/opt/matterhorn/felix/conf/services that seems to serve as a pointer
to the text extraction utility that causes the error.  The file is
org.opencastproject.textextractor.tesseract.TesseractTextExtractor.properties.
 I got rid of the # symbol and restarted my matterhorn services and we
were in business.  Hope this helps.

On Tue, Nov 22, 2011 at 6:29 AM, Kristof Keppens <[email protected]> wrote:
> Hi,
>
> We are getting further with the setup of our matterhorn infrastructure, and
> so far most things work and we are almost ready to launch the 1.2 version.
> However the problem with the text extraction is still there and I haven't
> found a solution so far. I did find the reason why the text extraction
> fails, the tif file generated for text extraction is most of the time a
> blank grey image, always the same file size and solid grey. Once in a while
> there is a correct tif file generated and the text extraction is fine then.
>
> I don't see a clear connection between the successful tif files and the
> failed ( it's a ratio of about 1/10 tif's are correct ) ones.
>
> Is anyone else experiencing these problems and found a solution ?
>
> Thanks
>
> Kristof Keppens
> Ghent University
>
> On 2011-10-13 14:56, Kristof Keppens wrote:
>>
>> Hi,
>>
>> I'm having some issues with the text extraction with our fresh 1.2
>> installation.
>> I keep getting the following error:
>>
>> 2011-10-13 13:03:31 WARN (TextAnalyzerServiceImpl:229) - Error
>> extracting text from
>> http://ic**.ugent.be:8080/files/collection/composer/550.tif
>> java.lang.IllegalArgumentException: The text cannot be empty
>> at
>> org.opencastproject.metadata.mpeg7.TextualImpl.<init>(TextualImpl.java:81)
>> at
>>
>> org.opencastproject.textanalyzer.impl.TextAnalyzerServiceImpl.analyze(TextAnalyzerServiceImpl.java:324)
>>
>> at
>>
>> org.opencastproject.textanalyzer.impl.TextAnalyzerServiceImpl.extract(TextAnalyzerServiceImpl.java:194)
>>
>> at
>>
>> org.opencastproject.textanalyzer.impl.TextAnalyzerServiceImpl.process(TextAnalyzerServiceImpl.java:253)
>>
>> at
>>
>> org.opencastproject.job.api.AbstractJobProducer$JobRunner.call(AbstractJobProducer.java:184)
>>
>> at
>>
>> org.opencastproject.job.api.AbstractJobProducer$JobRunner.call(AbstractJobProducer.java:156)
>>
>> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>> at
>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>
>> at
>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>
>> at java.lang.Thread.run(Thread.java:662)
>>
>> This error is repeated a number of times in the log. The text extraction
>> does not fail for every image, just for some images, but as a result the
>> recording
>> has the status failed with following error :
>>
>> org.opencastproject.workflow.api.WorkflowOperationException:
>> org.opencastproject.workflow.api.WorkflowOperationException: Text
>> extraction failed on images from
>>
>> http://ic**.ugent.be:8080/files/mediapackage/5952f751-e8f9-41e5-b55d-7002ca31a67b/8fd9ca3d-cfbc-429a-a035-2ddcbf608412/logica_trimmed.avi
>>
>>
>> These are tests with manually uploaded files, not sure if this could be
>> a factor why it fails?
>>
>> Thanks
>>
>> Kristof Keppens
>> _______________________________________________
>> Matterhorn-users mailing list
>> [email protected]
>> http://lists.opencastproject.org/mailman/listinfo/matterhorn-users
>
> _______________________________________________
> Matterhorn-users mailing list
> [email protected]
> http://lists.opencastproject.org/mailman/listinfo/matterhorn-users
>



-- 
Jack Vant
System Engineer - Unix
Office of Information Technology
Boise State University
208-426-4443
208-863-0031
_______________________________________________
Matterhorn-users mailing list
[email protected]
http://lists.opencastproject.org/mailman/listinfo/matterhorn-users

Reply via email to