Phew, thanks for bringing closure! Mike McCandless
http://blog.mikemccandless.com On Wed, Oct 3, 2012 at 2:12 PM, <[email protected]> wrote: > Mystery resolved; the problem was due to an ever-increasing record size, > which was in turn due to a record structure that was never being cleared. > This caused it to appear as if the total allocation of structures used for > analysis was steadily growing. But the number of such entities did NOT grow, > which is what gave away the solution. > > Thanks for the hints, and sorry for the confusion. > > Karl > > -----Original Message----- > From: Wright Karl (Nokia-LC/Boston) > Sent: Wednesday, October 03, 2012 12:41 PM > To: [email protected] > Subject: RE: Lucene 4.0 memory usage during indexing - is this expected? > > Threads are managed via an executor service and are a fixed size thread pool, > of size 16 on this machine. > > There are not a lot of fields in the schema (a half dozen). We do use > PerFieldAnalyzerWrapper. > > I'm still grappling with the mat reports; it's possible of course that we're > holding onto something unexpected, or even that we have a fragmentation > situation. Stay tuned. > > Karl > > -----Original Message----- > From: ext Michael McCandless [mailto:[email protected]] > Sent: Wednesday, October 03, 2012 11:50 AM > To: [email protected] > Subject: Re: Lucene 4.0 memory usage during indexing - is this expected? > > I wish I could remember/find the Jira issue here ... there was one fairly > recently. > > Are you really sure your not turning over threads that are coming through > Lucene...? High thread turnover causes challenges for ThreadLocals ... > > Do you have a lot of fields? Are you using PerFieldAnalyzerWrapper...? > > Mike McCandless > > http://blog.mikemccandless.com > > On Wed, Oct 3, 2012 at 10:45 AM, <[email protected]> wrote: >> There's a fixed-sized thread pool involved in doing the indexing, of a size >> that depends on the machine parameters. >> Karl >> >> -----Original Message----- >> From: ext Michael McCandless [mailto:[email protected]] >> Sent: Wednesday, October 03, 2012 10:43 AM >> To: Wright Karl (Nokia-LC/Boston) >> Subject: Re: Lucene 4.0 memory usage during indexing - is this expected? >> >> This is no good! >> >> Can you send an email to dev@? This sounds very familiar ... and I had >> thought we committed a fix for it ... hopefully Uwe or Robert can remember >> what it was! >> >> Do you create new threads frequently, to do indexing? Rather than pulling >> from a fixed pool? >> >> Mike McCandless >> >> http://blog.mikemccandless.com >> >> On Wed, Oct 3, 2012 at 8:32 AM, <[email protected]> wrote: >>> Hi Mike, >>> >>> >>> >>> I've got a technical question for you. >>> >>> >>> >>> For background, we've been building a new address search engine on >>> top of Lucene 4.0. The main customization involves a chain of custom >>> analyzers etc, and it all works quite well. Or at least it did until >>> I added 7m more documents to the list. At that point the indexing >>> process began to run out of memory, even though we were giving it >>> some 20GB. Only some 12GB of that is accounted for in our part of the >>> world. >>> >>> >>> >>> Looking at an eclipse MAT dump, the main thing that still seems to >>> grow over time is/are TokenStreamComponent objects that are being >>> held indirectly by org.apache.lucene.index.FieldInvertState objects. >>> The number of FieldInvertState objects grows and grows. By the >>> middle of the indexing process, there are 30 of these, and each one >>> of these seems to hold onto one TokenStreamComponent per field. >>> (Each TokenStreamComponent in turn holds onto a whole pile of things >>> like ICU tokenizers etc, so there's a strong multiplicative factor >>> involved, which in the end winds up holding about 10GB of memory for >>> those 30 objects.) >>> >>> >>> >>> The question: Why does the number of FieldInvertState objects grow >>> over time during indexing? Are these associated in some way with >>> segments? Is this expected behavior? >>> >>> >>> >>> Thanks! >>> >>> Karl >>> >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] For >> additional commands, e-mail: [email protected] >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] For additional > commands, e-mail: [email protected] > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
