[ 
https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701938#comment-16701938
 ] 

Tim Allison commented on TIKA-2776:
-----------------------------------

Thank you for the follow up!

To confirm/summarize...
1. I introduced a change in behavior (bug) into legacy server mode in 1.19 
(maybe 1.18?) that causes tika-server to return 'not available' forever after 
an OOM.  The legacy behavior was to ignore OOMs and _hope_ nothing too bad 
happened to your JVM.  That said, the change of behavior I introduced is bad, 
very bad.  I've fixed this in 1.20, which should be out in a few weeks.
2. tika-server in -spawnChild mode was restarting the child because you were 
getting timeouts.  This caused problems with Manifold.  You've bumped out the 
timeout to ~16 minutes, and you currently don't have any files that take longer 
than that...so all appears to work for now.
3. I _think_ we found that {{-spawnChild}} was behaving as it was designed to 
do.  To confirm, we did not find that the parent process shutdown, and we did 
find that the child restarted within a few seconds.  Is this correct?

My opinion/advice:
Depending on the nature of your documents, if you have large enough batches of 
crazy enough documents, you will eventually hit an infinite loop, and the child 
will timeout and restart.  So, for now, you've wallpapered over a problem by 
bumping out the timeout, but the timeouts will eventually happen.  So, what can 
we do in Tika, what can Manifold do, what can you do to help avoid this 
eventuality?

Again, many, many thanks for your patience getting the logging up and running.  
I still need to improve our wiki on logging with tika-server (based on our 
interaction) even more.  


> Tika server child restart
> -------------------------
>
>                 Key: TIKA-2776
>                 URL: https://issues.apache.org/jira/browse/TIKA-2776
>             Project: Tika
>          Issue Type: Bug
>            Reporter: Mario Bisonti
>            Assignee: Tim Allison
>            Priority: Blocker
>             Fix For: 2.0.0, 1.20
>
>         Attachments: Log.zip, MCF_JOB.png, log4j.xml, log4j_child.xml, 
> log4j_child.xml, man_tika.zip, tikalogchild.log
>
>
> Hallo.
> I use tika server standalone started with the option:
> java -jar /opt/tika/tika-server-1.19.1.jar -spawnChild
> I use ManifoldCF and Solr to index file using tika server.
> It happens that indexing is continuously crashed because I obtain many:
> Tika down, retrying: Connection reset
> etc.
> I suspect that, when a process is restarted, the client crash as mentioned 
> here:
> _If the child process is in the process of shutting down, and it gets a new 
> request it will return 503 -- Service Unavailable. If the server times out on 
> a file, the client will receive an IOException from the closed socket. Note 
> that all other files that are being processed will end with an IOException 
> from a closed socket when the child process shuts down; e.g. if you send 
> three files to tika-server concurrently, and one of them causes a 
> catastrophic problem requiring the child to shut down, you won't be able to 
> tell which file caused the problems. In the future, we may implement a 
> gentler shutdown than we currently have._
> as reported here https://wiki.apache.org/tika/TikaJAXRS
> How could I workaround it ?
> Thanks a lot
> Mario



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to