gracefully...well... give --spawnChild a try. That forks a child process that is the server. Now, unless you put a bunch of these behind a loadbalancer, you're client will have to be resilient if the server is restarting. The other problem with this in a multithreaded environment is you can't necessarily tell which file killed the server...threadA sends fileA which takes a while to process, threadB sends fileB which causes OOM...server dies before completing fileA... your clients can't tell which file caused the problem.
That said, it's what we have for robustness in tika-server. On Thu, Jun 25, 2020 at 9:41 AM Nicholas DiPiazza < [email protected]> wrote: > I have an application of Tika server that I'm sure is pretty common. > > I have parse nodes that download files from data sources, and will need to > parse out the content and metadata from these files. But it needs to be > resilient to OOM's and needs to time out gracefully. > > Up until now. I've been using this project here: > https://github.com/nddipiazza/tika-fork to parse files. This manages a > pool > of JVMs and pushes the requests through them. It makes it so if a file is a > bomb and blows up the JVM, it will not affect my program. > > However, when I use this out in the wild, I get a lot of strange timeouts > that I can't reproduce locally. Related to system resources on those local > systems I guess but I can't really figure out what the problem is. > > So I'm thinking instead I will try out a different approach. > > I would like to have each parser node have it's own Tika Server running, > and I'll just use the endpoint > > http://localhost:9998/unpack/all > > But I'm worried this will be plagued by the same problems that prompted me > to go to the tika-fork parser. Where this server will continually go down > due to OOMs because of random files in the wild that come in cause tika > bombs or cpu spikes due to infinite loops, etc. > > How is everyone else managing to do this in the field? Is there a way to > configure a Tika Fork parser on the Tika server so that it does not crash > upon zip bombs, excel bombs, etc? > > -Nicholas DiPiazza >
