[
https://issues.apache.org/jira/browse/TIKA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16130791#comment-16130791
]
Stefan Karner edited comment on TIKA-2434 at 8/17/17 4:44 PM:
--------------------------------------------------------------
Tim, thank you for the explanation.
This seems to be a misunderstanding. My workflows have processed several
thousand PDFs, and as far as I know, they are all files with a text layer and
inline images, but no attachments
I have no need to do any OCR on the files with Tesseract through Tika; when the
PDF has no text layer, I OCR it with Abbyy Finereader.
was (Author: stefankah):
Tim, thank you for the explanation.
This seems to be a misunderstanding. My workflows have processed several
thousand PDFs, and as far as I know, they are all files with a text layer and
inline images.
I have no need to do any OCR on the files with Tesseract through Tika; when the
PDF has no text layer, I OCR it with Abbyy Finereader.
> Language detection slow, cpu intensive, CLI interrupts work
> -----------------------------------------------------------
>
> Key: TIKA-2434
> URL: https://issues.apache.org/jira/browse/TIKA-2434
> Project: Tika
> Issue Type: Bug
> Components: cli
> Affects Versions: 1.16
> Environment: OS X 10.11.6, JRE 1.8.0_25
> Reporter: Stefan Karner
>
> Since version 1.16, when using tika -l FILE, it takes a lot longer than e.g.
> 1.15.
> Also, when batch processing a bunch of files in the background, the Java
> runtime icon pops up when processing the next file, stealing the input focus
> from whatever other application I'm currently working on, thus constantly
> interrupting my work.
> Also, the Java runtime uses from 100% to 400% CPU when executing Tika.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)