Jokin, On Tue, Jan 12, 2010 at 1:17 AM, Jokin Cuadrado <joki...@gmail.com> wrote:
> Seems correct to me, I don't know what it's the size of your index, > and the frequency of the updates and searches. Common optimizations > are to queue new documents, and warming searchers before switching > them. > > Thank you. Mine is a multi-site blogging engine, so index size can vary from 1 to 50k documents, updates can range from one new document every week (for single bloggers) to a few hundred new docs per day (for blogs with a lot of bloggers) Searches are a bit more frequent, as every post has a related post (using the similarity query) and pages accessed from google have a "more results like this". And for popular blogs this can be 100-200k page views per day. So definitely a search heavy approach: that's why I decided to reopen the IndexSearched after every new document added. What will "warming" achive? How can I implement it? > > by the way, the error thay you have encountered in the queryparser > it's very common, people new to lucene usually made the same mistake, > because you get used to sharing the different objects in lucene (and > it's recommended in the case of the indexwriters and indexreaders), so > sharing the queryparser seems the most logical thing. But queryparser > is a class generated by an automatic tool (javacc) and have some local > variables that maintain the states of the parsing steps. If you have > only one thread you can reuse it (when making different searches in > the same request for example), but as i said before, it's easier to > get a new one when you need it. > Yeah, I'm pretty new to Lucene...and I had that exact same approach :) btw: my search engine is pretty small and all enclosed in just one file. If you are interested and what to have a look at it here is the link to the code file: http://code.google.com/p/subtext/source/browse/trunk/src/Subtext.Framework/Services/SearchEngine/SearchEngineService.cs Thank you Simone > > On 1/12/10, Simone Chiaretta <simone.chiare...@gmail.com> wrote: > > And just to make sure I did everything correctly: > > > > IndexWriter: I create one at the app startup and always use the same > > instance: many users can use the same instance to add new documents to > index > > IndexSearcher: I create one at the first search, and use it to do all the > > searches (concurrent users can use the same one). And I do recreate it > when > > there is a new document in the index > > Analyzer: I create one at the beginning of the app (needed to create the > > writer) and reuse it > > Directory: create one at the beginning of time to create the writer and > keep > > this instance to create new indexsearchers when needed > > QueryParser: one per search > > > > Is that the correct approach? > > Thx > > Simone > > > > On Tue, Jan 12, 2010 at 12:45 AM, Jokin Cuadrado <joki...@gmail.com> > wrote: > > > >> Correct, that's the way we use it. > >> > >> On 1/12/10, Simone Chiaretta <simone.chiare...@gmail.com> wrote: > >> > OK, so I can create the query parser each time, using the analyzer I > >> created > >> > at the search engine startup? Correct? > >> > Simone > >> > > >> > On Tue, Jan 12, 2010 at 12:28 AM, Jokin Cuadrado <joki...@gmail.com> > >> wrote: > >> > > >> >> The queryparser it's not thread safe, so you must use a new one in > >> >> every request, however, is very lightweight, because the bigger > >> >> complexity comes from the underlying analyzer, and this one it's > >> >> thread safe. > >> >> > >> >> On 1/12/10, Simone Chiaretta <simone.chiare...@gmail.com> wrote: > >> >> > I'm trying to go live with our search engine implementation based > on > >> >> > Lucene.net. > >> >> > Unfortunately we have to keep it inside our appdomain in the web > >> >> application > >> >> > to make it work in shared hosting scenario. > >> >> > > >> >> > But we are getting quite a few problems, so I was wondering if > there > >> are > >> >> > some issues with concurrent access: > >> >> > 1 - is the QueryParser thread safe? Can I make it one at the > >> >> > beginning > >> >> > of > >> >> > the times and reuse it in all my queries? or do I've to create one > >> each > >> >> > time? > >> >> > I'm asking because I'm getting strange errors like: > >> >> > > >> >> > ystem.InvalidOperationException: Collection was modified; > enumeration > >> >> > operation may not execute. at > >> >> > System.Collections.ArrayList.ArrayListEnumeratorSimple.MoveNext() > >> at > >> >> > Lucene.Net.QueryParsers.QueryParser.Jj_add_error_token(Int32 kind, > >> Int32 > >> >> > pos) at Lucene.Net.QueryParsers.QueryParser.Jj_scan_token(Int32 > >> >> > kind) > >> >> > at Lucene.Net.QueryParsers.QueryParser.Jj_3_1() at > >> >> > Lucene.Net.QueryParsers.QueryParser.Jj_rescan_token() at > >> >> > Lucene.Net.QueryParsers.QueryParser.GenerateParseException() at > >> >> > Lucene.Net.QueryParsers.QueryParser.Jj_consume_token(Int32 kind) > >> at > >> >> > Lucene.Net.QueryParsers.QueryParser.Clause(String field) at > >> >> > Lucene.Net.QueryParsers.QueryParser.Query(String field) at > >> >> > Lucene.Net.QueryParsers.QueryParser.Parse(String query) at > >> >> > > >> > Subtext.Framework.Services.SearchEngine.SearchEngineService.Search(String > >> >> > queryString, Int32 max, Int32 blogId, Int32 entryId) > >> >> > > >> >> > Which looks to me like a threading issue. > >> >> > > >> >> > I also got this one: > >> >> > > >> >> > Lucene.Net.QueryParsers.QueryParser+LookaheadSuccess: Error in the > >> >> > application. > >> >> > at Lucene.Net.QueryParsers.QueryParser.Jj_scan_token(Int32 kind) > >> at > >> >> > Lucene.Net.QueryParsers.QueryParser.Jj_3R_2() at > >> >> > Lucene.Net.QueryParsers.QueryParser.Jj_3R_2() at > >> >> > Lucene.Net.QueryParsers.QueryParser.Jj_rescan_token() at > >> >> > Lucene.Net.QueryParsers.QueryParser.Jj_3_1() at > >> >> > Lucene.Net.QueryParsers.QueryParser.GenerateParseException() at > >> >> > Lucene.Net.QueryParsers.QueryParser.Jj_consume_token(Int32 kind) > >> at > >> >> > Lucene.Net.QueryParsers.QueryParser.Jj_consume_token(Int32 kind) > >> at > >> >> > Lucene.Net.QueryParsers.QueryParser.Term(String field) at > >> >> > Lucene.Net.QueryParsers.QueryParser.Clause(String field) at > >> >> > Lucene.Net.QueryParsers.QueryParser.Clause(String field) at > >> >> > Lucene.Net.QueryParsers.QueryParser.Query(String field) at > >> >> > Lucene.Net.QueryParsers.QueryParser.Query(String field) at > >> >> > Lucene.Net.QueryParsers.QueryParser.Parse(String query) at > >> >> > Lucene.Net.QueryParsers.QueryParser.Parse(String query) at > >> >> > > >> > Subtext.Framework.Services.SearchEngine.SearchEngineService.Search(String > >> >> > queryString, Int32 max, Int32 blogId, Int32 entryId) > >> >> > > >> >> > And this one: > >> >> > > >> >> > Lucene.Net.QueryParsers.ParseException: Cannot parse 'what is css > >> >> > url': Encountered > >> >> > "what is css url" at line 1, column 0. Was expecting one of: > >> >> > <NOT> > >> >> > ... "+" > >> >> > ... "-" ... "(" ... "*" ... <QUOTED> ... <TERM> > >> ... > >> >> > <PREFIXTERM> ... <WILDTERM> ... "[" ... "{" ... > >> >> > <NUMBER> ... > >> >> > at Lucene.Net.QueryParsers.QueryParser.Parse(String query) > >> >> > > >> >> > Which is fine if I really added an invalid character in the query, > >> >> > but > >> >> "what > >> >> > is css url" looks to me like it's a valid query. > >> >> > > >> >> > What I'm doing is, to avoid creating a new query parser for each > >> query, > >> >> to > >> >> > "cache" the same as variable inside the singleton class that holds > >> >> > the > >> >> > search engine. > >> >> > Is this a good approach? or a bad one? (I guess bad since this all > >> seem > >> >> to > >> >> > be threading issues). > >> >> > Is creating a new query parser for each query a performance > problem? > >> >> > > >> >> > Thank you > >> >> > Simone > >> >> > > >> >> > -- > >> >> > Simone Chiaretta > >> >> > Microsoft MVP ASP.NET - ASPInsider > >> >> > Blog: http://codeclimber.net.nz > >> >> > RSS: http://feeds2.feedburner.com/codeclimber > >> >> > twitter: @simonech > >> >> > > >> >> > Any sufficiently advanced technology is indistinguishable from > magic > >> >> > "Life is short, play hard" > >> >> > > >> >> > >> >> > >> >> -- > >> >> Jokin > >> >> > >> > > >> > > >> > > >> > -- > >> > Simone Chiaretta > >> > Microsoft MVP ASP.NET - ASPInsider > >> > Blog: http://codeclimber.net.nz > >> > RSS: http://feeds2.feedburner.com/codeclimber > >> > twitter: @simonech > >> > > >> > Any sufficiently advanced technology is indistinguishable from magic > >> > "Life is short, play hard" > >> > > >> > >> > >> -- > >> Jokin > >> > > > > > > > > -- > > Simone Chiaretta > > Microsoft MVP ASP.NET - ASPInsider > > Blog: http://codeclimber.net.nz > > RSS: http://feeds2.feedburner.com/codeclimber > > twitter: @simonech > > > > Any sufficiently advanced technology is indistinguishable from magic > > "Life is short, play hard" > > > > > -- > Jokin > -- Simone Chiaretta Microsoft MVP ASP.NET - ASPInsider Blog: http://codeclimber.net.nz RSS: http://feeds2.feedburner.com/codeclimber twitter: @simonech Any sufficiently advanced technology is indistinguishable from magic "Life is short, play hard"