[ 
https://issues.apache.org/jira/browse/TIKA-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17486525#comment-17486525
 ] 

Tim Allison commented on TIKA-3668:
-----------------------------------

bq. The CPU utilization is purely java process

There are two things that come to mind initially.
1) if our PDFParser is rendering pages for OCR even if Tesseract is turned off, 
then that would cause this.  I need to check if your header request is actually 
working to turn off tesseract in the PDFParser.
2) If the forked process is restarting often, then that can peg a machine.  It 
looks like there's some warmup costs each time the forked process is restarted. 
 This can take ~a minute to come down to low levels of cpu usage.  If the 
forked process is restarting often, then that could peg a machine.

bq. Is there a way we can see validate child process being invoked in top 
command or only through logs we can check it.

Not sure what you mean by this?  Parent/child processes are fairly well logged, 
and you can have different logging configs in the parent and child process so 
that you can see what's going on (via the jvmargs in <server> element.)

Last question for now, are you calling the /tika /rmeta endpoints or one of the 
new pipes/async endpoints?



> High CPU utilization in Tika 2.2.0
> ----------------------------------
>
>                 Key: TIKA-3668
>                 URL: https://issues.apache.org/jira/browse/TIKA-3668
>             Project: Tika
>          Issue Type: Bug
>            Reporter: Manjunath Dhongadi
>            Priority: Major
>
> Recently we upgraded Tika version from 1.26 to 2.2.0.
> We see the CPU utilization have gone high drastically(6 to 8 times more) in 
> both cases Tesseract enabled and Tesseract disabled case.
> We are using tika-parsers-standard-package of 2.2.0.
> Whether this is normal behavior of high version of Tika 2.2.0. 
> Any fine tuning parameters available for same.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to