[jira] [Commented] (TIKA-3668) High CPU utilization in Tika 2.2.0

Manjunath Dhongadi (Jira) Wed, 02 Feb 2022 08:36:37 -0800


    [ 
https://issues.apache.org/jira/browse/TIKA-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17485945#comment-17485945
 ]


Manjunath Dhongadi commented on TIKA-3668:
------------------------------------------

We use Rest API call for tika server.

We pass "X-Tika-OCRskipOcr: true" in case we do not need tesseract process. In 
other case as you mentioned Tesseract is automatically called in case of image 
extraction.
Have seen across PDF and other formats not specific to some file formats.

> High CPU utilization in Tika 2.2.0
> ----------------------------------
>
>                 Key: TIKA-3668
>                 URL: https://issues.apache.org/jira/browse/TIKA-3668
>             Project: Tika
>          Issue Type: Bug
>            Reporter: Manjunath Dhongadi
>            Priority: Major
>
> Recently we upgraded Tika version from 1.26 to 2.2.0.
> We see the CPU utilization have gone high drastically(6 to 8 times more) in 
> both cases Tesseract enabled and Tesseract disabled case.
> We are using tika-parsers-standard-package of 2.2.0.
> Whether this is normal behavior of high version of Tika 2.2.0. 
> Any fine tuning parameters available for same.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (TIKA-3668) High CPU utilization in Tika 2.2.0

Reply via email to