[ 
https://issues.apache.org/jira/browse/TIKA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16130791#comment-16130791
 ] 

Stefan Karner edited comment on TIKA-2434 at 8/17/17 4:44 PM:
--------------------------------------------------------------

Tim, thank you for the explanation.

This seems to be a misunderstanding. My workflows have processed several 
thousand PDFs, and as far as I know, they are all files with a text layer and 
inline images, but no attachments

I have no need to do any OCR on the files with Tesseract through Tika; when the 
PDF has no text layer, I OCR it with Abbyy Finereader.


was (Author: stefankah):
Tim, thank you for the explanation.

This seems to be a misunderstanding. My workflows have processed several 
thousand PDFs, and as far as I know, they are all files with a text layer and 
inline images.

I have no need to do any OCR on the files with Tesseract through Tika; when the 
PDF has no text layer, I OCR it with Abbyy Finereader.

> Language detection slow, cpu intensive, CLI interrupts work
> -----------------------------------------------------------
>
>                 Key: TIKA-2434
>                 URL: https://issues.apache.org/jira/browse/TIKA-2434
>             Project: Tika
>          Issue Type: Bug
>          Components: cli
>    Affects Versions: 1.16
>         Environment: OS X 10.11.6, JRE 1.8.0_25
>            Reporter: Stefan Karner
>
> Since version 1.16, when using tika -l FILE, it takes a lot longer than e.g. 
> 1.15.
> Also, when batch processing a bunch of files in the background, the Java 
> runtime icon pops up when processing the next file, stealing the input focus 
> from whatever other application I'm currently working on, thus constantly 
> interrupting my work.
> Also, the Java runtime uses from 100% to 400% CPU when executing Tika.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to