[ https://issues.apache.org/jira/browse/TIKA-2725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16606373#comment-16606373 ]
Tim Allison commented on TIKA-2725: ----------------------------------- {quote} Ideally, tika server is dockerized, runs on swarm as a service. In addition, it has healthckeck mechanism, say something ... like http get request with return code 200. Docker will runs this hc periodically, and if it fails, will restart tika server. However, we are far away. Two ways to go, fmpov ... 1. Your second option or ... os deamon which will check tika server availability or something like that. We can use cron on Linux to run our "healthcheck" and if it detects some anomalies, will restart a server. Probably for windows we can find such mecanism as well. {quote} CommonsExec? > Make tika-server robust against ooms/infinite loops/memory leaks > ---------------------------------------------------------------- > > Key: TIKA-2725 > URL: https://issues.apache.org/jira/browse/TIKA-2725 > Project: Tika > Issue Type: Task > Reporter: Tim Allison > Assignee: Tim Allison > Priority: Major > > Currently, tika-server is vulnerable to ooms, inifinite loops and memory > leaks. I see two ways of making it robust: > 1) use the ForkParser > 2) have tika-server spawn a child process that actually runs the server, put > a watcher thread in the child that will kill the child on oom/timeout/after x > files. The parent process can then restart the child if it dies. > I somewhat prefer 2) so that we don't have to doubly pass the inputstream. I > propose 2), and I propose making it optional in Tika 1.x, but then the > default in Tika 2.x. We could also add a status ping from parent to child in > case the child gets caught up in stop the world gc (h/t [~bleskes]). > Other options/recommendations? -- This message was sent by Atlassian JIRA (v7.6.3#76005)