Tim Allison commented on TIKA-2058:

Great!  Please let us know what else you find.

As a [public service 
 tho... We fixed this one oom, but I can guarantee that there will be others 
and, even better, permanent hangs!  They're rare, and we fix them when we can, 
but your code has to be robust against those.  

Yes, please do keep letting us know what we find.  We have a 3 million file 
regression corpus (TIKA-1302) hosted by Rackspace, but we will always benefit 
from more people trying to break it.

Thank you for opening this issue and working with us, well [~lfcnassif], to 
solve this!

> Memory Leak in Tika version 1.13 when parsing millions of files
> ---------------------------------------------------------------
>                 Key: TIKA-2058
>                 URL: https://issues.apache.org/jira/browse/TIKA-2058
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 1.13
>            Reporter: Tim Barrett
>         Attachments: Yourkit screenshot.png, poi-3.15-beta1-p1.jar, 
> poi-3.15-beta1-p1.pom, prevents-OOM-when-writable-is-false.patch, 
> screenshot-1.png, screenshot-2.png, screenshot-3.png
> We have an application using Tika which parses roughly 7,000,000 files of 
> different types, many of the files are MSG files with attachments. This works 
> correctly with Tika 1.9, and has been in production for over a year,  with 
> parsing runs taking place every few weeks. The same application runs into 
> insufficient memory problems (java heap) when using Tika 1.13.
> I have used lsof and file leak detector to track down open files, however 
> neither shows any open files when the application is running. I did find an 
> issue with open files https://issues.apache.org/jira/browse/TIKA-2015, 
> however there was a workaround for this and this is not the issue.
> I am sorry to have to report this with a level of vagueness, but with lsof 
> turning nothing up I am a bit stuck as to how to investigate further. We are 
> more than willing to help by testing on the basis of any ideas provided.

This message was sent by Atlassian JIRA

Reply via email to