Cool. I'll give this a try. Thanks! On Thu, Jun 25, 2020 at 9:36 AM Tim Allison <[email protected]> wrote:
> gracefully...well... give --spawnChild a try. > > That forks a child process that is the server. Now, unless you put a bunch > of these behind a loadbalancer, you're client will have to be resilient if > the server is restarting. The other problem with this in a multithreaded > environment is you can't necessarily tell which file killed the > server...threadA sends fileA which takes a while to process, threadB sends > fileB which causes OOM...server dies before completing fileA... your > clients can't tell which file caused the problem. > > That said, it's what we have for robustness in tika-server. > > On Thu, Jun 25, 2020 at 9:41 AM Nicholas DiPiazza < > [email protected]> wrote: > > > I have an application of Tika server that I'm sure is pretty common. > > > > I have parse nodes that download files from data sources, and will need > to > > parse out the content and metadata from these files. But it needs to be > > resilient to OOM's and needs to time out gracefully. > > > > Up until now. I've been using this project here: > > https://github.com/nddipiazza/tika-fork to parse files. This manages a > > pool > > of JVMs and pushes the requests through them. It makes it so if a file > is a > > bomb and blows up the JVM, it will not affect my program. > > > > However, when I use this out in the wild, I get a lot of strange timeouts > > that I can't reproduce locally. Related to system resources on those > local > > systems I guess but I can't really figure out what the problem is. > > > > So I'm thinking instead I will try out a different approach. > > > > I would like to have each parser node have it's own Tika Server running, > > and I'll just use the endpoint > > > > http://localhost:9998/unpack/all > > > > But I'm worried this will be plagued by the same problems that prompted > me > > to go to the tika-fork parser. Where this server will continually go down > > due to OOMs because of random files in the wild that come in cause tika > > bombs or cpu spikes due to infinite loops, etc. > > > > How is everyone else managing to do this in the field? Is there a way to > > configure a Tika Fork parser on the Tika server so that it does not crash > > upon zip bombs, excel bombs, etc? > > > > -Nicholas DiPiazza > > >
