Re: Possible memory leak in Lucene.NET 2.4?

Simone Chiaretta Wed, 06 Jan 2010 11:11:19 -0800

Hi Michael,

more below


On Wed, Jan 6, 2010 at 7:41 PM, Michael Garski <[email protected]>wrote:

> Simone,
>
> Filters work to constrain the query to the subset of documents that are
> contained in the filter, which can improve performance.


Ok, from what I see, filtering can help me filter out posts from other
blogs.
But can filters change with every query?
What's the difference between:
query for "xyz" on blog 1 over all index
vs.
query for "xyz" over the index filtered by blog 1?


> The field cache
> is used to cache values if you are sorting by something other than the
> score, such as by date or some other value in the index.
>


I'm just sorting by score... so probably not needed


>
> Optimizing after each document incurs an unnecessary overhead as all
> segments are merged into one, which is not necessary even in versions
> prior to 2.9.
>

Great, thank you... I can remove this, would help speed up the add document
procedure on large indexes...
and since in web app the pool recycles anyway every day or so, doing an
optimize at the creation of the index write will be enough, correct?


> If your app has not yet been released, I would suggest using 2.9 and
> ensuring you are not using any methods or properties marked with the
> Obsolete attribute to streamline migration to future versions.


Great... thank you again... is 2.9 the trunk, right? I don't see a tag for
it in SVN


> Another
> change in 2.9 you could take advantage of is retrieving an IndexReader
> from the IndexWriter through the GetReader method, which will save you
> from having to have both a writer and a reader in application scope.
> The writer could be held at the application level and the reader
> retrieved from it directly.
>

And that will give the most current reader updated with the latest new docs?

One last thing:
Would you be so kind (if you have time, and with the proper credit given in
the source code and in the release notes) to do a kind of source code review
to the search engine of the blog?
Thx

Simone


>
> Michael
>
>
> -----Original Message-----
> From: Simone Chiaretta [mailto:[email protected]]
> Sent: Wednesday, January 06, 2010 10:28 AM
> To: [email protected]
> Subject: Re: Possible memory leak in Lucene.NET 2.4?
>
> I'm just using queries... I'm pretty new to Lucene, so I went for the
> easier
> solution.
> Would you recommend using filters and caching instead of queries?
>
> At the moment I'm on Lucene 2.3.1... would you recommend moving to 2.9?
> My app has not been released yet (an open source blogging engine), but
> will
> be shortly.
> The number of documents indexed will range from 0 to 50.000 blog posts
> (our
> biggest installation atm).
>
> Will not optimizing after every new document reduce the performances of
> the
> searches on such indexes?
>
> Simone
>
> On Wed, Jan 6, 2010 at 7:08 PM, Michael Garski
> <[email protected]>wrote:
>
> > Simone,
> >
> > Are you using any field caches or filters?
> >
> > In versions prior to 2.9, reopening the index will completely rebuild
> > the field cache and filter bits for all documents in the index, which
> > can result in an increase in memory consumption.  In 2.9 and future
> > versions, the field cache and filter bits are cached at a segment
> level,
> > which results in significantly faster re-opens as only the new
> segments
> > are loaded into the caches.
> >
> > Our applications use very large indexes and 2.9's segment level
> caching
> > allows us to re-open indexes much faster while utilizing less memory
> in
> > the process.
> >
> > Michael
> >
> > -----Original Message-----
> > From: Simone Chiaretta [mailto:[email protected]]
> > Sent: Wednesday, January 06, 2010 10:01 AM
> > To: [email protected]
> > Subject: Re: Possible memory leak in Lucene.NET 2.4?
> >
> > What I am doing is initializing the writer in the App_Start event of
> the
> > web
> > app, and closing everything at the App_End event.
> > For the reader, I start it at the first search request, re-open it
> > everytime
> > a new document is added, and then closing it in the App_End
> >
> > If you are interested here is the search engine service I'm using:
> >
> http://code.google.com/p/subtext/source/browse/trunk/src/Subtext.Framewo
> >
> rk/Services/SearchEngine/SearchEngineService.cs<http://code.google.com/p
> /subtext/source/browse/trunk/src/Subtext.Framewo%0Ark/Services/SearchEng<http://code.google.com/p%0A/subtext/source/browse/trunk/src/Subtext.Framewo%0Ark/Services/SearchEng>
> ine/SearchEngineService.cs>
> >
> > Simone
> >
> > On Wed, Jan 6, 2010 at 6:31 PM, Matt Honeycutt
> > <[email protected]>wrote:
> >
> > > Won't the various global application events be fired if the app pool
> > is
> > > gracefully terminated/recycled?  While not ideal, couldn't you
> > initialize
> > > your Lucene objects during one of the application initialization,
> then
> > > dispose of them in the corresponding shutodwn events?
> > >
> > > On Wed, Jan 6, 2010 at 11:14 AM, Michael Garski
> > <[email protected]
> > > >wrote:
> > >
> > > > If it's not an option to create search functionality in a separate
> > > process,
> > > > such as in a shared hosting environment, you may be limited in the
> > size
> > > of
> > > > your index and how you query it.  The field cache, and to a lesser
> > extent
> > > > filters, will consume a fair amount of memory that is proportional
> > to the
> > > > number of documents in the index.
> > > >
> > > > As others have mentioned, you will have to ensure that resources
> are
> > > > released when the app pool recycles.
> > > >
> > > > Michael
> > > >
> > > > -----Original Message-----
> > > > From: Simone Chiaretta [mailto:[email protected]]
> > > > Sent: Wednesday, January 06, 2010 12:45 AM
> > > > To: [email protected]
> > > > Subject: Re: Possible memory leak in Lucene.NET 2.4?
> > > >
> > > > Unfortunately not everybody can use another process: I'm building
> a
> > > > blog engine that must be able to run on shared hosting provider.
> The
> > > > 2nd process is not an option :)
> > > >
> > > > Simone
> > > >
> > > > On Tuesday, January 5, 2010, Digy <[email protected]> wrote:
> > > > > As Michael stated, I prefer also not hosting "indexing and
> > searching
> > > > > sevices" in IIS.
> > > > > There are many alternatives such as WCF, Remoting etc. With a
> > separate
> > > > > service for Lucene, you can control anything you want.
> > > > >
> > > > > DIGY
> > > > >
> > > > > -----Original Message-----
> > > > > From: Michael Garski [mailto:[email protected]]
> > > > > Sent: Tuesday, January 05, 2010 11:11 PM
> > > > > To: [email protected]
> > > > > Subject: RE: Possible memory leak in Lucene.NET 2.4?
> > > > >
> > > > > Jeff,
> > > > >
> > > > > Correct - there is no need to optimize the index after adding a
> > > > > document, and I would recommend against it especially when you
> > move to
> > > > > 2.9 as you will not see any of the benefits of the changes to
> > composite
> > > > > readers such as faster incremental warm-ups to filters and field
> > > caches.
> > > > >
> > > > > I've never run Lucene.Net in the context of a web process and
> > would
> > > > > actually recommend against that approach due to app pool
> > recycling,
> > > > > opting for a service that exposed search functionality via WCF.
> > > > >
> > > > > What types of queries are you executing? Are you using filters
> or
> > > > > sorting?  How often do you re-open the IndexReader that is used
> > for
> > > > > searching?  Re-opening the reader after each document addition
> can
> > be
> > > an
> > > > > expensive process, especially if you are using filters and/or
> > sorts.
> > > > > How are you refreshing the IndexReader?
> > > > >
> > > > > Regarding the IndexReader locking files, this is a feature which
> > allows
> > > > > you to concurrently index and search on the same index and not
> > have to
> > > > > worry about the IndexWriter deleting a segment file from
> > underneath the
> > > > > searcher when a segment merge occurs.
> > > > >
> > > > > The first place to look would be to use a memory profiler to
> > determine
> > > > > what is actually consuming the memory.  I use the SciTech .NET
> > Memory
> > > > > Profiler for such purposes.
> > > > >
> > > > > Michael
> > > > >
> > > > > -----Original Message-----
> > > > > From: Jeff Pennal [mailto:[email protected]]
> > > > > Sent: Tuesday, January 05, 2010 12:42 PM
> > > > > To: [email protected]
> > > > > Subject: Possible memory leak in Lucene.NET 2.4?
> > > > >
> > > > > Hello all,
> > > > >
> > > > > In doing some profiling of our Lucene code, I noticed that we
> were
> > > doing
> > > > >
> > > > > an optimize code after every update to our index. Though our
> index
> > is
> > > > > relatively small (~75MB), the optimize task still look way to
> much
> > time
> > > > > to run.
> > > > >
> > > > > I did some research and it seems like it would not be an issue
> to
> > > update
> > > > >
> > > > > our index without optimizing afterwords, the side effect being
> > that
> > > we'd
> > > > >
> > > > > have more open file handles.
> > > > >
> > > > > I made that change and noticed some horrible performance side
> > effects.
> > > > >
> > > > > The first thing I noticed was that the CPU for our web
> application
> > > > > (ASP.NET MVC) that read from the Index never went below 60-70%
> and
> > was
> > > > > frequently pegged at 99%.
> > > > >
> > > > > In addition to the CPU spiking, the memory taken up by the
> > w3wp.exe
> > > > > process quickly grew to around 800MB, which is about 300MB above
> > > normal.
> > > > >
> > > > > This has all the hallmarks of a memory leak somewhere.
> > > > >
> > > > > Finally, I noticed that the IndexReader was locking some of the
> > files
> > > in
> > > > >
> > > > > the index folder even though the reader was set to nolock mode.
> > This
> > > > > seemed to be cause of the increase in the number of files in the
> > index
> > > > > folder.
> > > > >
> > > > > We have the IndexReader set to open once and then be shared
> among
> > every
> > > > > request to the web application. My understanding is that this is
> > the
> > > > > correct way to do this, and this never caused and issues when we
> > were
> > > > > optimizing the index after every update.
> > > > >
> > > > > I know this is a pretty vague problem and there could be any
> > number of
> > > > > issues involved here. However, if anyone could suggest areas to
> > look
> > > > > into for possible solutions, it would be greatly appreciated.
> > > > >
> > > > > Thanks,
> > > > > Jeff
> > > > >
> > > > >
> > > > >
> > > >
> > > > --
> > > > Simone Chiaretta
> > > > Microsoft MVP ASP.NET - ASPInsider
> > > > Blog: http://codeclimber.net.nz
> > > > RSS: http://feeds2.feedburner.com/codeclimber
> > > > twitter: @simonech
> > > >
> > > > Any sufficiently advanced technology is indistinguishable from
> magic
> > > > "Life is short, play hard"
> > > >
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > Simone Chiaretta
> > Microsoft MVP ASP.NET - ASPInsider
> > Blog: http://codeclimber.net.nz
> > RSS: http://feeds2.feedburner.com/codeclimber
> > twitter: @simonech
> >
> > Any sufficiently advanced technology is indistinguishable from magic
> > "Life is short, play hard"
> >
> >
>
>
> --
> Simone Chiaretta
> Microsoft MVP ASP.NET - ASPInsider
> Blog: http://codeclimber.net.nz
> RSS: http://feeds2.feedburner.com/codeclimber
> twitter: @simonech
>
> Any sufficiently advanced technology is indistinguishable from magic
> "Life is short, play hard"
>
>


-- 
Simone Chiaretta
Microsoft MVP ASP.NET - ASPInsider
Blog: http://codeclimber.net.nz
RSS: http://feeds2.feedburner.com/codeclimber
twitter: @simonech

Any sufficiently advanced technology is indistinguishable from magic
"Life is short, play hard"

Re: Possible memory leak in Lucene.NET 2.4?

Reply via email to