[ 
https://issues.apache.org/jira/browse/TIKA-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15485866#comment-15485866
 ] 

Luis Filipe Nassif commented on TIKA-2058:
------------------------------------------

I think that is not the case here at least. The allocated buffers were 
HeapByteBuffer and they are in the heap and are not memory mapped ones.

I looked at FileBackedDataSource source and probably the DataSource is not 
writable so ByteBuffer.allocate(length) is called instead of channel.map(...). 
The returned buffer is always added to buffersToClean, but it is necessary only 
to mmapped ones. There is no need to call buffersToClean.add(dst) when the 
returned buffer is in the heap. A simple check may fix the current OOM.

But I also think it strange memory mapping a portion of the file for every 
read! It is a very costly operation. I think it may be much better to mmap the 
hole file at once and to read the requested data from the returned ByteBuffer.

> Memory Leak in Tika version 1.13 when parsing millions of files
> ---------------------------------------------------------------
>
>                 Key: TIKA-2058
>                 URL: https://issues.apache.org/jira/browse/TIKA-2058
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 1.13
>            Reporter: Tim Barrett
>         Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png
>
>
> We have an application using Tika which parses roughly 7,000,000 files of 
> different types, many of the files are MSG files with attachments. This works 
> correctly with Tika 1.9, and has been in production for over a year,  with 
> parsing runs taking place every few weeks. The same application runs into 
> insufficient memory problems (java heap) when using Tika 1.13.
> I have used lsof and file leak detector to track down open files, however 
> neither shows any open files when the application is running. I did find an 
> issue with open files https://issues.apache.org/jira/browse/TIKA-2015, 
> however there was a workaround for this and this is not the issue.
> I am sorry to have to report this with a level of vagueness, but with lsof 
> turning nothing up I am a bit stuck as to how to investigate further. We are 
> more than willing to help by testing on the basis of any ideas provided.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to