Hi, It seems like my IndexWriter after commiting and optimizing has a retained size of 140Mb. See [1] for a screenshot of the heapdump analysis done with Eclipse MAT.
Of those 140MB 67MB are retained by analyzer.tokenStreams.hardRefs.table.HashMap$Entry.value.tokenStream.scanner.zzBuffer why is this? Is it a memory leak? or did I something wrong during the indxing? (BTW, I'm indexing document which contains Fields(xxxx,Reader) and those Reader are wrappers around Tika.parse(xxxx) Readers. I get a lot IOExceptions from tika readers and the wrapper maps the exceptions to EOF so Lucene doesn't see the exception). ...and 73MB of the 140MB are retained by docWriter see [2]. It looks like the Field objects in the array docWriter.threadStates[0].consumer.fieldHash[1].fields[xxxx] are holding references to the Readers. Those reader instances are actually closed after IndexWriter.updateDocument. Each one of those Readers retains 1MB. The question is why IndexWriter holds references to those Readers after the Documents have been indexed. [1] http://img.skitch.com/20100407-1183815yiausisg73u9wfgscsj.jpg [2] http://img.skitch.com/20100407-b86irkp7e4uif2wq1dd4t899qb.jpg -- /Rubén