Tim Barrett created TIKA-2058:
---------------------------------

             Summary: Memory Leak in Tika version 1.13 when parsing millions of 
files
                 Key: TIKA-2058
                 URL: https://issues.apache.org/jira/browse/TIKA-2058
             Project: Tika
          Issue Type: Bug
    Affects Versions: 1.13
            Reporter: Tim Barrett


We have an application using Tika which parses roughly 7,000,000 files of 
different types, many of the files are MSG files with attachments. This works 
correctly with Tika 1.9, and has been in production for over a year,  with 
parsing runs taking place every few weeks. The same application runs into 
insufficient memory problems (java heap) when using Tika 1.13.

I have used lsof and file leak detector to track down open files, however 
neither shows any open files when the application is running. I did find an 
issue with open files https://issues.apache.org/jira/browse/TIKA-2015, however 
there was a workaround for this and this is not the issue.

I am sorry to have to report this with a level of vagueness, but with lsof 
turning nothing up I am a bit stuck as to how to investigate further. We are 
more than willing to help by testing on the basis of any ideas provided.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to