[
https://issues.apache.org/jira/browse/TIKA-3518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17396332#comment-17396332
]
Abha commented on TIKA-3518:
----------------------------
Please find my response inline -
{color:#FFAB00}When you say the processbuilder isn't creating the tmp file,
does that mean that tesseract is failing to run at all?{color}
- It is not failing to run, it is able to extract the metadata correctly, but
not able to extract the image content, since it's not able to create the tmp
output (txt) file and fails this check (see below link) hence no content
extraction -
https://github.com/apache/tika/blob/6f4365b9ef03ac99de21f10a6e3f2a98452c5007/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-ocr-module/src/main/java/org/apache/tika/parser/ocr/TesseractOCRParser.java#L289
{color:#FFAB00}have you tried 1.27?{color}
- Yes, it's the same issue for 1.27 as well
Starting from Tesseract version 4.0.0 this issue occurs, it works fine with
Tesseract 3.x version and 4.0.0alpha
Also, i am able to run TesseractOCR through commandline and it extracts the
content correctly.
> Tika 1.26 not Working with Tesseract 4.0 and Higher Version
> -----------------------------------------------------------
>
> Key: TIKA-3518
> URL: https://issues.apache.org/jira/browse/TIKA-3518
> Project: Tika
> Issue Type: Bug
> Components: ocr, tika-batch, tika-dl, tika-server
> Affects Versions: 1.26
> Reporter: Abha
> Priority: Major
>
> ProcessBuilder not creating tmp file for Tesseract 4.1 and Higher Versions
> With Tika 1.26 and JDK 1.8
> I am working on a project which integrates Tika and Tesseract OCR Tika
> Version is 1.26, JDK 1.8 Now for any Tesseract Version earlier than 4.0 works
> fine and extracts the image/pdf data correctly But upgrading the TesseractOCR
> to 4.1.1 or Higher results in no data extraction. I debugged the issue and
> found that the ProcessBuilder is not creating the temporary txt output file
> from which TesseractOCR extracts the result, resulting in the issue. Any idea
> if this is a version compatibility issue Or How to resolve this?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)