Marichi Gupta created TIKA-2908:
-----------------------------------

             Summary: TikaException: Failed to close temporary resource - how 
to fix?
                 Key: TIKA-2908
                 URL: https://issues.apache.org/jira/browse/TIKA-2908
             Project: Tika
          Issue Type: Bug
          Components: ocr, parser
    Affects Versions: 1.21
            Reporter: Marichi Gupta


I am using Apache Tika on Windows 10, jre 1.8.0_181, and I've imported Tika 
using Maven with the following dependencies:

{{<dependencies> <dependency> <groupId>junit</groupId> 
<artifactId>junit</artifactId> <version>3.8.1</version> <scope>test</scope> 
</dependency> <dependency> <groupId>org.apache.tika</groupId> 
<artifactId>tika-parsers</artifactId> <version>1.21</version> </dependency> 
</dependencies>}}

I have the code below for performing OCR using Tesseract (which I have 
independently tested and know to be working):

{{public static void OCRTest() { }}

{{try { }}

{{BufferedImage im = ImageIO.read(new File(OCR_IMAGE)); }}

{{TesseractOCRConfig config = new TesseractOCRConfig();}}

{{ config.setTessdataPath("C:\\Program Files\\Tesseract-OCR\\tessdata");}}

{{ config.setTesseractPath("C:\\Program Files\\Tesseract-OCR"); }}

{{ParseContext parseContext = new ParseContext();}}

{{ parseContext.set(TesseractOCRConfig.class, config); }}

{{TesseractOCRParser parser = new TesseractOCRParser(); }}

{{BodyContentHandler handler = new BodyContentHandler(); }}

{{Metadata metadata = new Metadata(); }}

{{try { }}

{{parser.parse(im, handler, metadata, parseContext);}}

{{ System.out.println(handler.toString()); }}

{{} catch (SAXException e) { e.printStackTrace(); } }}

{{catch (TikaException e) { e.printStackTrace(); } }}

{{} }}{{catch (IOException e) { e.printStackTrace(); } }}}

I run into the following exception:

{{org.apache.tika.exception.TikaException: Failed to close temporary resources 
at org.apache.tika.io.TemporaryResources.dispose(TemporaryResources.java:174) 
at 
org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:251)
 at test.test.App.OCRTest(App.java:46) at test.test.App.main(App.java:30) 
Caused by: java.nio.file.FileSystemException: 
C:\Users\m\AppData\Local\Temp\apache-tika-2643805894084124300.tmp: The process 
cannot access the file because it is being used by another process. }}

The tmp file is present in the Temp folder. I have the source code downloaded 
and have stepped through it with the debugger - the error comes from attempting 
to close the tmp file. On the Apache Tika forums, there is another post here 
(https://issues.apache.org/jira/browse/TIKA-1732) where someone else has run 
into the same exception, although with the AutoDetectParser and not Tesseract. 
Their issue seemed to be a conflict in their imported jars, but I run into this 
issue even with only the Apache Tika libraries installed. I have a feeling this 
is a concurrency issue, but I can't pinpoint the conflict.

I don't run into this issue when using the Tika's AutoDetectParser, only with 
the TesseractOCRParser. This is an important part of an application I'm working 
on, so I would really appreciate any insights on how to proceed.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to