Re: What is thread safe in Lucene.net?

Simone Chiaretta Mon, 11 Jan 2010 16:36:39 -0800

Jokin,

On Tue, Jan 12, 2010 at 1:17 AM, Jokin Cuadrado <joki...@gmail.com> wrote:


> Seems correct to me, I don't know what it's the size of your index,
> and the frequency of the updates and searches. Common optimizations
> are to queue new documents, and warming searchers before switching
> them.
>
>
Thank you.
Mine is a multi-site blogging engine, so index size can vary from 1 to 50k
documents, updates can range from one new document every week (for single
bloggers) to a few hundred new docs per day (for blogs with a lot of
bloggers)
Searches are a bit more frequent, as every post has a related post (using
the similarity query) and pages accessed from google have a "more results
like this". And for popular blogs this can be 100-200k page views per day.
So definitely a search heavy approach: that's why I decided to reopen the
IndexSearched after every new document added.
What will "warming" achive? How can I implement it?


>
> by the way, the error thay you have encountered in the queryparser
> it's very common, people new to lucene usually made the same mistake,
> because you get used  to sharing the different objects in lucene (and
> it's recommended in the case of the indexwriters and indexreaders), so
> sharing the queryparser seems the most logical thing. But queryparser
> is a class generated by an automatic tool (javacc) and have some local
> variables that maintain the states of the parsing steps. If you have
> only one thread you can reuse it (when making different searches in
> the same request for example), but as i said before, it's easier to
> get a new one when you need it.
>

Yeah, I'm pretty new to Lucene...and I had that exact same approach :)

btw: my search engine is pretty small and all enclosed in just one file.
If you are interested and what to have a look at it here is the link to the
code file:
http://code.google.com/p/subtext/source/browse/trunk/src/Subtext.Framework/Services/SearchEngine/SearchEngineService.cs

Thank you
Simone


>
> On 1/12/10, Simone Chiaretta <simone.chiare...@gmail.com> wrote:
> > And just to make sure I did everything correctly:
> >
> > IndexWriter: I create one at the app startup and always use the same
> > instance: many users can use the same instance to add new documents to
> index
> > IndexSearcher: I create one at the first search, and use it to do all the
> > searches (concurrent users can use the same one). And I do recreate it
> when
> > there is a new document in the index
> > Analyzer: I create one at the beginning of the app (needed to create the
> > writer) and reuse it
> > Directory: create one at the beginning of time to create the writer and
> keep
> > this instance to create new indexsearchers when needed
> > QueryParser: one per search
> >
> > Is that the correct approach?
> > Thx
> > Simone
> >
> > On Tue, Jan 12, 2010 at 12:45 AM, Jokin Cuadrado <joki...@gmail.com>
> wrote:
> >
> >> Correct, that's the way we use it.
> >>
> >> On 1/12/10, Simone Chiaretta <simone.chiare...@gmail.com> wrote:
> >> > OK, so I can create the query parser each time, using the analyzer I
> >> created
> >> > at the search engine startup? Correct?
> >> > Simone
> >> >
> >> > On Tue, Jan 12, 2010 at 12:28 AM, Jokin Cuadrado <joki...@gmail.com>
> >> wrote:
> >> >
> >> >> The queryparser it's not thread safe, so you must use a new one in
> >> >> every request, however, is very lightweight, because the bigger
> >> >> complexity comes from the underlying analyzer, and this one it's
> >> >> thread safe.
> >> >>
> >> >> On 1/12/10, Simone Chiaretta <simone.chiare...@gmail.com> wrote:
> >> >> > I'm trying to go live with our search engine implementation based
> on
> >> >> > Lucene.net.
> >> >> > Unfortunately we have to keep it inside our appdomain in the web
> >> >> application
> >> >> > to make it work in shared hosting scenario.
> >> >> >
> >> >> > But we are getting quite a few problems, so I was wondering if
> there
> >> are
> >> >> > some issues with concurrent access:
> >> >> > 1 - is the QueryParser thread safe? Can I make it one at the
> >> >> > beginning
> >> >> > of
> >> >> > the times and reuse it in all my queries? or do I've to create one
> >> each
> >> >> > time?
> >> >> > I'm asking because I'm getting strange errors like:
> >> >> >
> >> >> > ystem.InvalidOperationException: Collection was modified;
> enumeration
> >> >> > operation may not execute.     at
> >> >> > System.Collections.ArrayList.ArrayListEnumeratorSimple.MoveNext()
> >> at
> >> >> > Lucene.Net.QueryParsers.QueryParser.Jj_add_error_token(Int32 kind,
> >> Int32
> >> >> > pos)     at Lucene.Net.QueryParsers.QueryParser.Jj_scan_token(Int32
> >> >> > kind)
> >> >> > at Lucene.Net.QueryParsers.QueryParser.Jj_3_1()     at
> >> >> > Lucene.Net.QueryParsers.QueryParser.Jj_rescan_token()     at
> >> >> > Lucene.Net.QueryParsers.QueryParser.GenerateParseException()     at
> >> >> > Lucene.Net.QueryParsers.QueryParser.Jj_consume_token(Int32 kind)
> >> at
> >> >> > Lucene.Net.QueryParsers.QueryParser.Clause(String field)     at
> >> >> > Lucene.Net.QueryParsers.QueryParser.Query(String field)     at
> >> >> > Lucene.Net.QueryParsers.QueryParser.Parse(String query)     at
> >> >> >
> >>
> Subtext.Framework.Services.SearchEngine.SearchEngineService.Search(String
> >> >> > queryString, Int32 max, Int32 blogId, Int32 entryId)
> >> >> >
> >> >> > Which looks to me like a threading issue.
> >> >> >
> >> >> > I also got this one:
> >> >> >
> >> >> > Lucene.Net.QueryParsers.QueryParser+LookaheadSuccess: Error in the
> >> >> > application.
> >> >> > at Lucene.Net.QueryParsers.QueryParser.Jj_scan_token(Int32 kind)
> >> at
> >> >> > Lucene.Net.QueryParsers.QueryParser.Jj_3R_2()     at
> >> >> > Lucene.Net.QueryParsers.QueryParser.Jj_3R_2()     at
> >> >> > Lucene.Net.QueryParsers.QueryParser.Jj_rescan_token()     at
> >> >> > Lucene.Net.QueryParsers.QueryParser.Jj_3_1()     at
> >> >> > Lucene.Net.QueryParsers.QueryParser.GenerateParseException()     at
> >> >> > Lucene.Net.QueryParsers.QueryParser.Jj_consume_token(Int32 kind)
> >> at
> >> >> > Lucene.Net.QueryParsers.QueryParser.Jj_consume_token(Int32 kind)
> >> at
> >> >> > Lucene.Net.QueryParsers.QueryParser.Term(String field)     at
> >> >> > Lucene.Net.QueryParsers.QueryParser.Clause(String field)     at
> >> >> > Lucene.Net.QueryParsers.QueryParser.Clause(String field)     at
> >> >> > Lucene.Net.QueryParsers.QueryParser.Query(String field)     at
> >> >> > Lucene.Net.QueryParsers.QueryParser.Query(String field)     at
> >> >> > Lucene.Net.QueryParsers.QueryParser.Parse(String query)     at
> >> >> > Lucene.Net.QueryParsers.QueryParser.Parse(String query)     at
> >> >> >
> >>
> Subtext.Framework.Services.SearchEngine.SearchEngineService.Search(String
> >> >> > queryString, Int32 max, Int32 blogId, Int32 entryId)
> >> >> >
> >> >> > And this one:
> >> >> >
> >> >> > Lucene.Net.QueryParsers.ParseException: Cannot parse 'what is css
> >> >> > url': Encountered
> >> >> > "what is css url" at line 1, column 0. Was expecting one of:
> >> >> > <NOT>
> >> >> > ...     "+"
> >> >> > ...     "-" ...     "(" ...     "*" ...     <QUOTED> ...     <TERM>
> >> ...
> >> >> > <PREFIXTERM> ...     <WILDTERM> ...     "[" ...     "{" ...
> >> >> > <NUMBER> ...
> >> >> > at Lucene.Net.QueryParsers.QueryParser.Parse(String query)
> >> >> >
> >> >> > Which is fine if I really added an invalid character in the query,
> >> >> > but
> >> >> "what
> >> >> > is css url" looks to me like it's a valid query.
> >> >> >
> >> >> > What I'm doing is, to avoid creating a new query parser for each
> >> query,
> >> >> to
> >> >> > "cache" the same as variable inside the singleton class that holds
> >> >> > the
> >> >> > search engine.
> >> >> > Is this a good approach? or a bad one? (I guess bad since this all
> >> seem
> >> >> to
> >> >> > be threading issues).
> >> >> > Is creating a new query parser for each query a performance
> problem?
> >> >> >
> >> >> > Thank you
> >> >> > Simone
> >> >> >
> >> >> > --
> >> >> > Simone Chiaretta
> >> >> > Microsoft MVP ASP.NET - ASPInsider
> >> >> > Blog: http://codeclimber.net.nz
> >> >> > RSS: http://feeds2.feedburner.com/codeclimber
> >> >> > twitter: @simonech
> >> >> >
> >> >> > Any sufficiently advanced technology is indistinguishable from
> magic
> >> >> > "Life is short, play hard"
> >> >> >
> >> >>
> >> >>
> >> >> --
> >> >> Jokin
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Simone Chiaretta
> >> > Microsoft MVP ASP.NET - ASPInsider
> >> > Blog: http://codeclimber.net.nz
> >> > RSS: http://feeds2.feedburner.com/codeclimber
> >> > twitter: @simonech
> >> >
> >> > Any sufficiently advanced technology is indistinguishable from magic
> >> > "Life is short, play hard"
> >> >
> >>
> >>
> >> --
> >> Jokin
> >>
> >
> >
> >
> > --
> > Simone Chiaretta
> > Microsoft MVP ASP.NET - ASPInsider
> > Blog: http://codeclimber.net.nz
> > RSS: http://feeds2.feedburner.com/codeclimber
> > twitter: @simonech
> >
> > Any sufficiently advanced technology is indistinguishable from magic
> > "Life is short, play hard"
> >
>
>
> --
> Jokin
>



-- 
Simone Chiaretta
Microsoft MVP ASP.NET - ASPInsider
Blog: http://codeclimber.net.nz
RSS: http://feeds2.feedburner.com/codeclimber
twitter: @simonech

Any sufficiently advanced technology is indistinguishable from magic
"Life is short, play hard"

Re: What is thread safe in Lucene.net?

Reply via email to