Hmm why is the heap dump so immense?  Normally it contains the top N
(eg 100) object types and their count/aggregate RAM usage.

Can you attach the infoStream output to an email (to java-user)?

Mike

On Fri, Apr 2, 2010 at 5:28 PM, Woolf, Ross <ross_wo...@bmc.com> wrote:
> I have this and the heap dump is 63mb zipped.  The info stream is much 
> smaller (31 kb zipped), but I don't know how to get them to you.
>
> We are not using the NRT readers
>
> -----Original Message-----
> From: Michael McCandless [mailto:luc...@mikemccandless.com]
> Sent: Thursday, April 01, 2010 5:21 PM
> To: java-user@lucene.apache.org
> Subject: Re: IndexWriter and memory usage
>
> Hmm, not good.  Can you post a heap dump?  Also, can you turn on
> infoStream, index up to the OOM @ 512 MB, and post the output?
>
> IndexWriter should not hang onto much beyond the RAM buffer.  But, it
> does allocate and then recycle this RAM buffer, so even in an idle
> state (having indexed enough docs to fill up the RAM buffer at least
> once) it'll hold onto those 16 MB.
>
> Are you using getReader (to get your NRT readers)?  If so, are you
> really sure you're eventually closing the previous reader after
> opening a new one?
>
> Mike
>
> On Thu, Apr 1, 2010 at 6:58 PM, Woolf, Ross <ross_wo...@bmc.com> wrote:
>> We are seeing a situation where the IndexWriter is using up the Java Heap 
>> space and only releases memory for garbage collection upon a commit.   We 
>> are using the default RAMBufferSize of 16 mb.  We are using Lucene 2.9.1. We 
>> are set at heap size of 512 mb.
>>
>> We have a large number of documents that are run through Tika and then added 
>> to the index.  The data from Tika is changed to a string, and then sent to 
>> Lucene.  Heap dumps clearly show the data in the Lucene classes and not in 
>> Tika.  Our intent is to only perform a commit once the entire indexing run 
>> is complete, but several hours into the process everything comes to a crawl. 
>>  In using both JConsole and VisualVM  we can see that the heap space is 
>> maxed out and garbage collection is not able to clean up any memory once we 
>> get into this state.  It is our understanding that the IndexWriter should be 
>> only holding onto 16 mb of data before it flushes it, but what we are seeing 
>> is that while it is in fact writing data to disk when it hits the 16 mb 
>> limit, it is also holding onto some data in memory and not allowing garbage 
>> collection to take place, and this continues until garbage collection is 
>> unable to free up enough space to all things to move faster than a crawl.
>>
>> As a test we caused a commit to occur after each document is indexed and we 
>> see the total amount of memory reduced from nearly 100% of the Java Heap to 
>> around 70-75%.  The profiling tools now show that the memory is cleaned up 
>> to some extent after each document.  But of course this completely defeats 
>> the whole reason why we want to only commit at the end of the run for 
>> performance sake.  Most of the data, as seen using Heap analasis, is held in 
>> Byte, Character, and Integer classes whos GC roots are tied back to the 
>> Writer Objects and threads.  The instance counts, after running just 1,100 
>> documents seems staggering
>>
>> Is there additional data that the IndexWriter hangs onto regardless of when 
>> it hits the RAMBufferSize limit?  Why are we seeing the heap space all being 
>> used up?
>>
>> A side question to this is the fact that we always see a large amount of 
>> memory used by the IndexWriter even after our indexing has been completed 
>> and all commits have taken place (basically in an idle state).  Why would 
>> this be?  Is the only way to totally clean up the memory is to close the 
>> writer?  Our index is also used for real time indexing so the IndexWriter is 
>> intended to remain open for the lifetime of the app.
>>
>> Any help in understanding why the IndexWriter is maxing out our heap space 
>> or what is expected from memory usage of the IndexWriter would be 
>> appreciated.
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to