[
https://issues.apache.org/jira/browse/TIKA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16109266#comment-16109266
]
Tim Allison edited comment on TIKA-2434 at 8/1/17 4:45 PM:
-----------------------------------------------------------
1) Great! [~chrismattmann], recommendations for adding headless to the brew
script? Can anyone see any fall-out from running tika in headless mode? I
should probably run tika headless against our regression corpus to see if there
are any diffs.
2) In TIKA-2374, [~gagravarr] requested that this be added for -z option.
However, I thought it would be bizarre for a user to be able to extract all
images, but then not get text via OCR on those images. [~gagravarr], should I
back-off and do just this: extract inline images only for -z but not for text
extraction? Or, should we leave this as is?
So that I understand, you want to run OCR on regular "attachment" images inside
PDFs but not on their inline images?
was (Author: [email protected]):
1) Great! [~chrismattmann], recommendations for adding headless to the brew
script? Can anyone see any fall-out from running tika in headless mode? I
should probably run tika headless against our regression corpus to see if there
are any diffs.
2) In TIKA-2374, [~gagravarr] requested that this be added for -z option.
However, I thought it would be bizarre for a user to be able to extract all
images, but then not get text via OCR on those images. [~gagravarr], should I
back-off and do just this: extract inline images only for -z but not for text
extraction? Or, should we leave this as is?
So that I understand, you want to run OCR on the PDFs but not on their inline
images?
> Language detection slow, cpu intensive, CLI interrupts work
> -----------------------------------------------------------
>
> Key: TIKA-2434
> URL: https://issues.apache.org/jira/browse/TIKA-2434
> Project: Tika
> Issue Type: Bug
> Components: cli
> Affects Versions: 1.16
> Environment: OS X 10.11.6, JRE 1.8.0_25
> Reporter: Stefan Karner
>
> Since version 1.16, when using tika -l FILE, it takes a lot longer than e.g.
> 1.15.
> Also, when batch processing a bunch of files in the background, the Java
> runtime icon pops up when processing the next file, stealing the input focus
> from whatever other application I'm currently working on, thus constantly
> interrupting my work.
> Also, the Java runtime uses from 100% to 400% CPU when executing Tika.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)