I'm just using queries... I'm pretty new to Lucene, so I went for the easier solution. Would you recommend using filters and caching instead of queries?
At the moment I'm on Lucene 2.3.1... would you recommend moving to 2.9? My app has not been released yet (an open source blogging engine), but will be shortly. The number of documents indexed will range from 0 to 50.000 blog posts (our biggest installation atm). Will not optimizing after every new document reduce the performances of the searches on such indexes? Simone On Wed, Jan 6, 2010 at 7:08 PM, Michael Garski <[email protected]>wrote: > Simone, > > Are you using any field caches or filters? > > In versions prior to 2.9, reopening the index will completely rebuild > the field cache and filter bits for all documents in the index, which > can result in an increase in memory consumption. In 2.9 and future > versions, the field cache and filter bits are cached at a segment level, > which results in significantly faster re-opens as only the new segments > are loaded into the caches. > > Our applications use very large indexes and 2.9's segment level caching > allows us to re-open indexes much faster while utilizing less memory in > the process. > > Michael > > -----Original Message----- > From: Simone Chiaretta [mailto:[email protected]] > Sent: Wednesday, January 06, 2010 10:01 AM > To: [email protected] > Subject: Re: Possible memory leak in Lucene.NET 2.4? > > What I am doing is initializing the writer in the App_Start event of the > web > app, and closing everything at the App_End event. > For the reader, I start it at the first search request, re-open it > everytime > a new document is added, and then closing it in the App_End > > If you are interested here is the search engine service I'm using: > http://code.google.com/p/subtext/source/browse/trunk/src/Subtext.Framewo > rk/Services/SearchEngine/SearchEngineService.cs<http://code.google.com/p/subtext/source/browse/trunk/src/Subtext.Framewo%0Ark/Services/SearchEngine/SearchEngineService.cs> > > Simone > > On Wed, Jan 6, 2010 at 6:31 PM, Matt Honeycutt > <[email protected]>wrote: > > > Won't the various global application events be fired if the app pool > is > > gracefully terminated/recycled? While not ideal, couldn't you > initialize > > your Lucene objects during one of the application initialization, then > > dispose of them in the corresponding shutodwn events? > > > > On Wed, Jan 6, 2010 at 11:14 AM, Michael Garski > <[email protected] > > >wrote: > > > > > If it's not an option to create search functionality in a separate > > process, > > > such as in a shared hosting environment, you may be limited in the > size > > of > > > your index and how you query it. The field cache, and to a lesser > extent > > > filters, will consume a fair amount of memory that is proportional > to the > > > number of documents in the index. > > > > > > As others have mentioned, you will have to ensure that resources are > > > released when the app pool recycles. > > > > > > Michael > > > > > > -----Original Message----- > > > From: Simone Chiaretta [mailto:[email protected]] > > > Sent: Wednesday, January 06, 2010 12:45 AM > > > To: [email protected] > > > Subject: Re: Possible memory leak in Lucene.NET 2.4? > > > > > > Unfortunately not everybody can use another process: I'm building a > > > blog engine that must be able to run on shared hosting provider. The > > > 2nd process is not an option :) > > > > > > Simone > > > > > > On Tuesday, January 5, 2010, Digy <[email protected]> wrote: > > > > As Michael stated, I prefer also not hosting "indexing and > searching > > > > sevices" in IIS. > > > > There are many alternatives such as WCF, Remoting etc. With a > separate > > > > service for Lucene, you can control anything you want. > > > > > > > > DIGY > > > > > > > > -----Original Message----- > > > > From: Michael Garski [mailto:[email protected]] > > > > Sent: Tuesday, January 05, 2010 11:11 PM > > > > To: [email protected] > > > > Subject: RE: Possible memory leak in Lucene.NET 2.4? > > > > > > > > Jeff, > > > > > > > > Correct - there is no need to optimize the index after adding a > > > > document, and I would recommend against it especially when you > move to > > > > 2.9 as you will not see any of the benefits of the changes to > composite > > > > readers such as faster incremental warm-ups to filters and field > > caches. > > > > > > > > I've never run Lucene.Net in the context of a web process and > would > > > > actually recommend against that approach due to app pool > recycling, > > > > opting for a service that exposed search functionality via WCF. > > > > > > > > What types of queries are you executing? Are you using filters or > > > > sorting? How often do you re-open the IndexReader that is used > for > > > > searching? Re-opening the reader after each document addition can > be > > an > > > > expensive process, especially if you are using filters and/or > sorts. > > > > How are you refreshing the IndexReader? > > > > > > > > Regarding the IndexReader locking files, this is a feature which > allows > > > > you to concurrently index and search on the same index and not > have to > > > > worry about the IndexWriter deleting a segment file from > underneath the > > > > searcher when a segment merge occurs. > > > > > > > > The first place to look would be to use a memory profiler to > determine > > > > what is actually consuming the memory. I use the SciTech .NET > Memory > > > > Profiler for such purposes. > > > > > > > > Michael > > > > > > > > -----Original Message----- > > > > From: Jeff Pennal [mailto:[email protected]] > > > > Sent: Tuesday, January 05, 2010 12:42 PM > > > > To: [email protected] > > > > Subject: Possible memory leak in Lucene.NET 2.4? > > > > > > > > Hello all, > > > > > > > > In doing some profiling of our Lucene code, I noticed that we were > > doing > > > > > > > > an optimize code after every update to our index. Though our index > is > > > > relatively small (~75MB), the optimize task still look way to much > time > > > > to run. > > > > > > > > I did some research and it seems like it would not be an issue to > > update > > > > > > > > our index without optimizing afterwords, the side effect being > that > > we'd > > > > > > > > have more open file handles. > > > > > > > > I made that change and noticed some horrible performance side > effects. > > > > > > > > The first thing I noticed was that the CPU for our web application > > > > (ASP.NET MVC) that read from the Index never went below 60-70% and > was > > > > frequently pegged at 99%. > > > > > > > > In addition to the CPU spiking, the memory taken up by the > w3wp.exe > > > > process quickly grew to around 800MB, which is about 300MB above > > normal. > > > > > > > > This has all the hallmarks of a memory leak somewhere. > > > > > > > > Finally, I noticed that the IndexReader was locking some of the > files > > in > > > > > > > > the index folder even though the reader was set to nolock mode. > This > > > > seemed to be cause of the increase in the number of files in the > index > > > > folder. > > > > > > > > We have the IndexReader set to open once and then be shared among > every > > > > request to the web application. My understanding is that this is > the > > > > correct way to do this, and this never caused and issues when we > were > > > > optimizing the index after every update. > > > > > > > > I know this is a pretty vague problem and there could be any > number of > > > > issues involved here. However, if anyone could suggest areas to > look > > > > into for possible solutions, it would be greatly appreciated. > > > > > > > > Thanks, > > > > Jeff > > > > > > > > > > > > > > > > > > -- > > > Simone Chiaretta > > > Microsoft MVP ASP.NET - ASPInsider > > > Blog: http://codeclimber.net.nz > > > RSS: http://feeds2.feedburner.com/codeclimber > > > twitter: @simonech > > > > > > Any sufficiently advanced technology is indistinguishable from magic > > > "Life is short, play hard" > > > > > > > > > > > > > > > -- > Simone Chiaretta > Microsoft MVP ASP.NET - ASPInsider > Blog: http://codeclimber.net.nz > RSS: http://feeds2.feedburner.com/codeclimber > twitter: @simonech > > Any sufficiently advanced technology is indistinguishable from magic > "Life is short, play hard" > > -- Simone Chiaretta Microsoft MVP ASP.NET - ASPInsider Blog: http://codeclimber.net.nz RSS: http://feeds2.feedburner.com/codeclimber twitter: @simonech Any sufficiently advanced technology is indistinguishable from magic "Life is short, play hard"
