[jira] [Commented] (TIKA-2058) Memory Leak in Tika version 1.13 when parsing millions of files

Tim Allison (JIRA) Tue, 30 Aug 2016 06:42:34 -0700

    [ 
https://issues.apache.org/jira/browse/TIKA-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15449043#comment-15449043
 ]


Tim Allison commented on TIKA-2058:
-----------------------------------

TIKA-2045's solution had to do with font caching (IIRC) within PDFBox; the 
triggering file happened to be protected against extraction, but that was 
incidental.  Also, TIKA-2045 (again IIRC) had to do with high memory usage per 
document, not a persistent/cross document memory leak.  Is Tika failing on the 
same documents consistently (like TIKA-2045), i.e. if you run Tika against a 
single triggering file, does it use a crazy amount of memory; or does this seem 
to be a caching-type memory leak across documents?

> Memory Leak in Tika version 1.13 when parsing millions of files
> ---------------------------------------------------------------
>
>                 Key: TIKA-2058
>                 URL: https://issues.apache.org/jira/browse/TIKA-2058
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 1.13
>            Reporter: Tim Barrett
>
> We have an application using Tika which parses roughly 7,000,000 files of 
> different types, many of the files are MSG files with attachments. This works 
> correctly with Tika 1.9, and has been in production for over a year,  with 
> parsing runs taking place every few weeks. The same application runs into 
> insufficient memory problems (java heap) when using Tika 1.13.
> I have used lsof and file leak detector to track down open files, however 
> neither shows any open files when the application is running. I did find an 
> issue with open files https://issues.apache.org/jira/browse/TIKA-2015, 
> however there was a workaround for this and this is not the issue.
> I am sorry to have to report this with a level of vagueness, but with lsof 
> turning nothing up I am a bit stuck as to how to investigate further. We are 
> more than willing to help by testing on the basis of any ideas provided.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TIKA-2058) Memory Leak in Tika version 1.13 when parsing millions of files

Reply via email to