Hmm why is the heap dump so immense? Normally it contains the top N (eg 100) object types and their count/aggregate RAM usage.
Can you attach the infoStream output to an email (to java-user)? Mike On Fri, Apr 2, 2010 at 5:28 PM, Woolf, Ross <ross_wo...@bmc.com> wrote: > I have this and the heap dump is 63mb zipped. The info stream is much > smaller (31 kb zipped), but I don't know how to get them to you. > > We are not using the NRT readers > > -----Original Message----- > From: Michael McCandless [mailto:luc...@mikemccandless.com] > Sent: Thursday, April 01, 2010 5:21 PM > To: java-user@lucene.apache.org > Subject: Re: IndexWriter and memory usage > > Hmm, not good. Can you post a heap dump? Also, can you turn on > infoStream, index up to the OOM @ 512 MB, and post the output? > > IndexWriter should not hang onto much beyond the RAM buffer. But, it > does allocate and then recycle this RAM buffer, so even in an idle > state (having indexed enough docs to fill up the RAM buffer at least > once) it'll hold onto those 16 MB. > > Are you using getReader (to get your NRT readers)? If so, are you > really sure you're eventually closing the previous reader after > opening a new one? > > Mike > > On Thu, Apr 1, 2010 at 6:58 PM, Woolf, Ross <ross_wo...@bmc.com> wrote: >> We are seeing a situation where the IndexWriter is using up the Java Heap >> space and only releases memory for garbage collection upon a commit. We >> are using the default RAMBufferSize of 16 mb. We are using Lucene 2.9.1. We >> are set at heap size of 512 mb. >> >> We have a large number of documents that are run through Tika and then added >> to the index. The data from Tika is changed to a string, and then sent to >> Lucene. Heap dumps clearly show the data in the Lucene classes and not in >> Tika. Our intent is to only perform a commit once the entire indexing run >> is complete, but several hours into the process everything comes to a crawl. >> In using both JConsole and VisualVM we can see that the heap space is >> maxed out and garbage collection is not able to clean up any memory once we >> get into this state. It is our understanding that the IndexWriter should be >> only holding onto 16 mb of data before it flushes it, but what we are seeing >> is that while it is in fact writing data to disk when it hits the 16 mb >> limit, it is also holding onto some data in memory and not allowing garbage >> collection to take place, and this continues until garbage collection is >> unable to free up enough space to all things to move faster than a crawl. >> >> As a test we caused a commit to occur after each document is indexed and we >> see the total amount of memory reduced from nearly 100% of the Java Heap to >> around 70-75%. The profiling tools now show that the memory is cleaned up >> to some extent after each document. But of course this completely defeats >> the whole reason why we want to only commit at the end of the run for >> performance sake. Most of the data, as seen using Heap analasis, is held in >> Byte, Character, and Integer classes whos GC roots are tied back to the >> Writer Objects and threads. The instance counts, after running just 1,100 >> documents seems staggering >> >> Is there additional data that the IndexWriter hangs onto regardless of when >> it hits the RAMBufferSize limit? Why are we seeing the heap space all being >> used up? >> >> A side question to this is the fact that we always see a large amount of >> memory used by the IndexWriter even after our indexing has been completed >> and all commits have taken place (basically in an idle state). Why would >> this be? Is the only way to totally clean up the memory is to close the >> writer? Our index is also used for real time indexing so the IndexWriter is >> intended to remain open for the lifetime of the app. >> >> Any help in understanding why the IndexWriter is maxing out our heap space >> or what is expected from memory usage of the IndexWriter would be >> appreciated. >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org