[
https://issues.apache.org/jira/browse/TIKA-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15484180#comment-15484180
]
Tim Barrett commented on TIKA-2058:
-----------------------------------
I added the dependency for the latest snapshot version of pdfbox to the top of
the POM for our own project - so I was not building Tika itself, rather I was
adding a dependency on pdfbox that Tika also has and the Tika dependency was
thus overridden. This came before the dependency on Tika itself. I checked the
classpath and checked the list of jars in maven dependencies in the eclipse
project, only the latest pdfbox snapshot jar was included. I disabled pdf
parsing in our own project by using our own logic - basically if it ends with
pdf don't parse it!
How do you want me to send you the heap dump? zip it and give you a btsync key?
> Memory Leak in Tika version 1.13 when parsing millions of files
> ---------------------------------------------------------------
>
> Key: TIKA-2058
> URL: https://issues.apache.org/jira/browse/TIKA-2058
> Project: Tika
> Issue Type: Bug
> Affects Versions: 1.13
> Reporter: Tim Barrett
>
> We have an application using Tika which parses roughly 7,000,000 files of
> different types, many of the files are MSG files with attachments. This works
> correctly with Tika 1.9, and has been in production for over a year, with
> parsing runs taking place every few weeks. The same application runs into
> insufficient memory problems (java heap) when using Tika 1.13.
> I have used lsof and file leak detector to track down open files, however
> neither shows any open files when the application is running. I did find an
> issue with open files https://issues.apache.org/jira/browse/TIKA-2015,
> however there was a workaround for this and this is not the issue.
> I am sorry to have to report this with a level of vagueness, but with lsof
> turning nothing up I am a bit stuck as to how to investigate further. We are
> more than willing to help by testing on the basis of any ideas provided.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)