[ 
https://issues.apache.org/jira/browse/TIKA-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16850870#comment-16850870
 ] 

Nicholas DiPiazza commented on TIKA-2575:
-----------------------------------------

Hi [~talli...@apache.org] i created a project here to help me address this 
problem: https://github.com/nddipiazza/tika-fork/

So now when a file is a tika bomb (causes out of memory condition), it will 
just crash the forked JVM, and evict that process from the pool. Eventually a 
new one will start in its place. 

We had originally tried to use the Tika JaxRS project.

But we noticed due to HTTP being involved, buffering on jetty, etc, this was 
not ideal.

So this project instead uses direct sockets to communicate between the parent 
JVM process and it seems to be as fast as a local tika parse.

Any feedback on this? It was my memorial day fun project so I won't be offended 
if you think its garbage. 

> Provide a way to abort tika parses when tika input stream buffer grows passed 
> a certain threshold
> -------------------------------------------------------------------------------------------------
>
>                 Key: TIKA-2575
>                 URL: https://issues.apache.org/jira/browse/TIKA-2575
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>            Reporter: Nicholas DiPiazza
>            Priority: Major
>         Attachments: screenshot-1.png
>
>
> Sometimes, for example, you use tika to parse an XLS file that isn't really 
> that big, maybe 60 MB. and suddenly the JVM heap size taken is >800Mb which 
> causes an OOM in my case.
> Can we make an "abort threshold" where the tika parse will halt if parse 
> output bytes exceeds this value?
> Or it is possible for users to already do this themselves by watching the 
> input stream as it grows somehow?
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to