There's a fixed-sized thread pool involved in doing the indexing, of a size that depends on the machine parameters. Karl
-----Original Message----- From: ext Michael McCandless [mailto:[email protected]] Sent: Wednesday, October 03, 2012 10:43 AM To: Wright Karl (Nokia-LC/Boston) Subject: Re: Lucene 4.0 memory usage during indexing - is this expected? This is no good! Can you send an email to dev@? This sounds very familiar ... and I had thought we committed a fix for it ... hopefully Uwe or Robert can remember what it was! Do you create new threads frequently, to do indexing? Rather than pulling from a fixed pool? Mike McCandless http://blog.mikemccandless.com On Wed, Oct 3, 2012 at 8:32 AM, <[email protected]> wrote: > Hi Mike, > > > > I've got a technical question for you. > > > > For background, we've been building a new address search engine on top > of Lucene 4.0. The main customization involves a chain of custom > analyzers etc, and it all works quite well. Or at least it did until > I added 7m more documents to the list. At that point the indexing > process began to run out of memory, even though we were giving it some > 20GB. Only some 12GB of that is accounted for in our part of the world. > > > > Looking at an eclipse MAT dump, the main thing that still seems to > grow over time is/are TokenStreamComponent objects that are being held > indirectly by org.apache.lucene.index.FieldInvertState objects. The > number of FieldInvertState objects grows and grows. By the middle of > the indexing process, there are 30 of these, and each one of these > seems to hold onto one TokenStreamComponent per field. (Each > TokenStreamComponent in turn holds onto a whole pile of things like > ICU tokenizers etc, so there's a strong multiplicative factor > involved, which in the end winds up holding about 10GB of memory for > those 30 objects.) > > > > The question: Why does the number of FieldInvertState objects grow > over time during indexing? Are these associated in some way with > segments? Is this expected behavior? > > > > Thanks! > > Karl > > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
