And one more thing: there is no need to put a lock around portions of code that adds documents to the index and that do the search, but only to the ones that create searcher and indexer, correct? Simo
On Tuesday, January 12, 2010, Simone Chiaretta <simone.chiare...@gmail.com> wrote: > Jokin, > > On Tue, Jan 12, 2010 at 1:17 AM, Jokin Cuadrado <joki...@gmail.com> wrote: > > Seems correct to me, I don't know what it's the size of your index, > and the frequency of the updates and searches. Common optimizations > are to queue new documents, and warming searchers before switching > them. > > > Thank you. > Mine is a multi-site blogging engine, so index size can vary from 1 to 50k > documents, updates can range from one new document every week (for single > bloggers) to a few hundred new docs per day (for blogs with a lot of bloggers) > Searches are a bit more frequent, as every post has a related post (using the > similarity query) and pages accessed from google have a "more results like > this". And for popular blogs this can be 100-200k page views per day. So > definitely a search heavy approach: that's why I decided to reopen the > IndexSearched after every new document added. > What will "warming" achive? How can I implement it? > > > by the way, the error thay you have encountered in the queryparser > it's very common, people new to lucene usually made the same mistake, > because you get used to sharing the different objects in lucene (and > it's recommended in the case of the indexwriters and indexreaders), so > sharing the queryparser seems the most logical thing. But queryparser > is a class generated by an automatic tool (javacc) and have some local > variables that maintain the states of the parsing steps. If you have > only one thread you can reuse it (when making different searches in > the same request for example), but as i said before, it's easier to > get a new one when you need it. > > Yeah, I'm pretty new to Lucene...and I had that exact same approach :) > > btw: my search engine is pretty small and all enclosed in just one file. > If you are interested and what to have a look at it here is the link to the > code file: > http://code.google.com/p/subtext/source/browse/trunk/src/Subtext.Framework/Services/SearchEngine/SearchEngineService.cs > > Thank you > Simone > > > On 1/12/10, Simone Chiaretta <simone.chiare...@gmail.com> wrote: >> And just to make sure I did everything correctly: >> >> IndexWriter: I create one at the app startup and always use the same >> instance: many users can use the same instance to add new documents to index >> IndexSearcher: I create one at the first search, and use it to do all the >> searches (concurrent users can use the same one). And I do recreate it when >> there is a new document in the index >> Analyzer: I create one at the beginning of the app (needed to create the >> writer) and reuse it >> Directory: create one at the beginning of time to create the writer and keep >> this instance to create new indexsearchers when needed >> QueryParser: one per search >> >> Is that the correct approach? >> Thx >> Simone >> >> On Tue, Jan 12, 2010 at 12:45 AM, Jokin Cuadrado <joki...@gmail.com> wrote: >> >>> Correct, that's the way we use it. >>> >>> On 1/12/10, Simone Chiaretta <simone.chiare...@gmail.com> wrote: >>> > OK, so I can create the query parser each time, using the analyzer I >>> created >>> > at the search engine startup? Correct? >>> > Simone >>> > >>> > On Tue, Jan 12, 2010 at 12:28 AM, Jokin Cuadrado <joki...@gmail.com> >>> wrote: >>> > >>> >> The queryparser it's not thread safe, so you must use a new one in >>> >> every request, however, is very lightweight, because the bigger >>> >> complexity comes from the underlying analyzer, and this one it's >>> >> thread safe. >>> >> >>> >> On 1/12/10, Simone Chiaretta <simone.chiare...@gmail.com> wrote: >>> >> > I'm trying to go live with our search engine implementation based on >>> >> > Lucene.net. >>> >> > Unfortunately we have to keep it inside our appdomain in the web >>> >> application >>> >> > to make it work in shared hosting scenario. >>> >> > >>> >> > But we are getting quite a few problems, so I was wondering if there >>> are >>> >> > some issues with concurrent access: >>> >> > 1 - is the QueryParser thread safe? Can I make it one at the >>> >> > beginning >>> >> > of >>> >> > the times and reuse it in all my queries? or do I've to create one >>> each >>> >> > time? >>> >> > I'm asking because I'm getting strange errors like: >>> >> > >>> >> > ystem.InvalidOperationException: Collection was modified; enumeration >>> >> > operation may not execute. at >>> >> > System.Collections.ArrayList.ArrayListEnumeratorSimple.MoveNext() >>> at >>> >> > Lucene.Net.QueryParsers.QueryParser.Jj_add_error_token(Int32 kind, >>> Int32 >>> >> > pos) at Lucene.Net.QueryParsers.QueryParser.Jj_scan_token(Int32 >>> >> > kind) >>> >> > at Lucene.Net.QueryParsers.QueryParser.Jj_3_1() at >>> >> > Lucene.Net.QueryParsers.QueryParser.Jj_rescan_token() at >>> >> > Lucene.Net.QueryParsers.QueryParser.GenerateParseException() at >>> >> > Lucene.Net.QueryParsers.QueryParser.Jj_consume_token(Int32 kind) >>> at >>> >> > Lucene.Net.QueryParsers.QueryParser.Clause(String field) at >>> >> > Lucene.Net.QueryParsers.QueryParser.Query(String field) at >>> >> > Lucene.Net.QueryParsers.QueryParser.Parse(String query) at >>> >> > >>> Subtext.Framework.Services.SearchEngine.SearchEngineService.Search(String >>> >> > queryString, Int32 max, Int32 blogId, Int32 entryId) >>> >> > >>> >> > Which looks to me like a threading issue. >>> >> > >>> >> > I also got this > > -- > Simone Chiaretta > Microsoft MVP ASP.NET - ASPInsider > Blog: http://codeclimber.net.nz > RSS: http://feeds2.feedburner.com/codeclimber > twitter: @simonech > > Any sufficiently advanced technology is indistinguishable from magic > "Life is short, play hard" > -- Simone Chiaretta Microsoft MVP ASP.NET - ASPInsider Blog: http://codeclimber.net.nz RSS: http://feeds2.feedburner.com/codeclimber twitter: @simonech Any sufficiently advanced technology is indistinguishable from magic "Life is short, play hard"