Cool. I'll give this a try. Thanks!

On Thu, Jun 25, 2020 at 9:36 AM Tim Allison <[email protected]> wrote:

> gracefully...well...  give  --spawnChild a try.
>
> That forks a child process that is the server.  Now, unless you put a bunch
> of these behind a loadbalancer, you're client will have to be resilient if
> the server is restarting.  The other problem with this in a multithreaded
> environment is you can't necessarily tell which file killed the
> server...threadA sends fileA which takes a while to process, threadB sends
> fileB which causes OOM...server dies before completing fileA... your
> clients can't tell which file caused the problem.
>
> That said, it's what we have for robustness in tika-server.
>
> On Thu, Jun 25, 2020 at 9:41 AM Nicholas DiPiazza <
> [email protected]> wrote:
>
> > I have an application of Tika server that I'm sure is pretty common.
> >
> > I have parse nodes that download files from data sources, and will need
> to
> > parse out the content and metadata from these files. But it needs to be
> > resilient to OOM's and needs to time out gracefully.
> >
> > Up until now. I've been using this project here:
> > https://github.com/nddipiazza/tika-fork to parse files. This manages a
> > pool
> > of JVMs and pushes the requests through them. It makes it so if a file
> is a
> > bomb and blows up the JVM, it will not affect my program.
> >
> > However, when I use this out in the wild, I get a lot of strange timeouts
> > that I can't reproduce locally.  Related to system resources on those
> local
> > systems I guess but I can't really figure out what the problem is.
> >
> > So I'm thinking instead I will try out a different approach.
> >
> > I would like to have each parser node have it's own Tika Server running,
> > and I'll just use the endpoint
> >
> > http://localhost:9998/unpack/all
> >
> > But I'm worried this will be plagued by the same problems that prompted
> me
> > to go to the tika-fork parser. Where this server will continually go down
> > due to OOMs because of random files in the wild that come in cause tika
> > bombs or cpu spikes due to infinite loops, etc.
> >
> > How is everyone else managing to do this in the field? Is there a way to
> > configure a Tika Fork parser on the Tika server so that it does not crash
> > upon zip bombs, excel bombs, etc?
> >
> > -Nicholas DiPiazza
> >
>

Reply via email to