Re: speed of BooleanQueries on 2.9

eks dev Wed, 15 Jul 2009 11:36:25 -0700

> Weird.  Have you run CheckIndex?
nope, I guess it brings nothing: two times built index; Bug provoked by 
changing one parameter  that controls only search caused it => no corrupt index?


You think we should give it a try? Hell, why not :)

What do you mean by "Can you do a binary search to locate the term(s) that's 
causing it?"

I know exactly which term combination causes it, last Query.toString() I have 
sent.... if I simplify Query by dropping one term with its expansions, it runs 
fine... or if I replace any of these terms it works fine,We tried with higer 
freq. terms, lower... everything fine... bizzar

 




----- Original Message ----
> From: Michael McCandless <luc...@mikemccandless.com>
> To: java-user@lucene.apache.org
> Sent: Wednesday, 15 July, 2009 19:57:09
> Subject: Re: speed of BooleanQueries on 2.9
> 
> OK thanks for the updates.  Yes, we are on the hunt now ;)  Something
> nasty is lurking...
> 
> Weird.  Have you run CheckIndex?
> 
> Can you do a binary search to locate the term(s) that's causing it?
> 
> It's great you see 10% speedup in searching overall (excluding these ones...)!
> 
> Mike
> 
> On Wed, Jul 15, 2009 at 1:49 PM, eks devwrote:
> >
> >
> > 1. pls forget minNumberShould match, it is NOT set on this particular query 
> (minNumberShouldMatch is determined dynamically, depending on semantics of 
> user 
> query... sometimes triggers, sometimes not...).
> > This Exact Query here causes search to take longer than 180 Seconds with 
>  allowDocsOutOfOrder = true, and less than 70mS with false. Repeatable?!? No 
> gc() effects involved... on 2.4 it does not happen, it works fine with both 
> true/false for allowDocsOutOfOrder
> >
> > 2. re your test, That is exactly what makes me wonder, we also see average 
> performance almost 10% better on 2.9 (even on this index when we exclude 
> these 
> stuck searches),  but on this particular index our customer's QA managed to 
> find 
> these "stuck requests".
> >
> > 3. If I change tokens involved, in exactly same-structured Query, it runs 
> > fine 
> => The problem is somehow term-defendant (bah!)
> >
> > Please understand that I do not have direct access to this index and it 
> > makes 
> debug cycles slightly longer. Typically I give them some jar-s and they run 
> it 
> ans send me logs back... Sorry for inaccuracies in description, but I am sure 
> there is a problem in lucene... We tried it with Luke as well, freshly built 
> index, we see exactly the same behavior (no bugs in our app that could cause 
> it, 
> except maybe wrong lucene usage somewhere)
> >
> >
> > Hard, but please stay with me, we will fix one ugly bug :)
> >
> >
> >
> >
> >
> >
> >
> > ----- Original Message ----
> >> From: Michael McCandless 
> >> To: java-user@lucene.apache.org
> >> Sent: Wednesday, 15 July, 2009 19:27:24
> >> Subject: Re: speed of BooleanQueries on 2.9
> >>
> >> But, that query can't accept a minNumberShouldMatch -- are you really
> >> setting that?  (You get 0 results if you set it, because the top
> >> boolean query has a single required clause).  Maybe you set it only on
> >> the inner large OR-query?  (But then I don't see the ~2 on that inner
> >> clause).
> >>
> >> I've tested a 21 term OR query, with allowDocsOutOfOrder true,
> >> numHits=200 on a Wikpedia index that matches 10M docs and I'm seeing
> >> the same perf on trunk & 2.4.
> >>
> >> Mike
> >>
> >> On Wed, Jul 15, 2009 at 11:41 AM, eks devwrote:
> >> >
> >> > sorry for confusion, here is exact query that runs forever with
> >> setAllowDocsOutOfOrder:
> >> > You see it on stack trace taken while "stuck"
> >> 
> o.a.l.search.TopScoreDocCollector$OutOfOrderTopScoreDocCollector.collect(UnknownSource)
> >> >
> >> >
> >> > Query: +(((NAME:maria NAME:marae^0.25171682 NAME:marai^0.2365632
> >> NAME:marao^0.2365632 NAME:marau^0.2365632 NAME:marea^0.2834352
> >> NAME:marei^0.25171682 NAME:mareo^0.25171682 NAME:mareu^0.25171682
> >> NAME:marie^0.28577283 NAME:marieh^0.2451648 NAME:mariha^0.2583552
> >> NAME:mariu^0.27189124 NAME:marja^0.2834352 NAME:marje^0.2673408
> >> NAME:marji^0.25171682 NAME:marjo^0.25171682 NAME:marju^0.25171682
> >> NAME:marla^0.2673408 NAME:marle^0.25171682 NAME:marli^0.2365632
> >> NAME:marlo^0.2365632 NAME:maroa^0.2673408 NAME:maroe^0.25171682
> >> NAME:maroi^0.2365632 NAME:marou^0.2365632 NAME:marua^0.2673408
> >> NAME:marue^0.25171682 NAME:marui^0.2365632 NAME:maruo^0.2365632
> >> NAME:marye^0.2673408 NAME:maryi^0.25171682 NAME:maryo^0.25171682
> >> NAME:meria^0.2787888 NAME:miria^0.25835523 NAME:moria^0.25835523
> >> NAME:muria^0.25835523 NAME:naria^0.27648002 NAME:narie^0.25392002
> >> NAME:neria^0.25392002) (NAME:piekarski NAME:bekarski^0.19200002
> >> NAME:beugarski^0.20281483 NAME:blacharski^0.19200002
> >> >  NAME:lekarski^0.19200002 NAME:pecarski^0.21294187 
> NAME:peikarski^0.27648002
> >> NAME:pekarska^0.20172001 NAME:pekarski^0.22446752 NAME:pekarskj^0.21294187
> >> NAME:pekarsky^0.21294187 NAME:pickarske^0.21168004 
> >> NAME:pickarski^0.22073482
> >> NAME:piekalski^0.23941332 NAME:piekanski^0.23941332 
> >> NAME:piekaraka^0.22533335
> >> NAME:piekarsci^0.29205337 NAME:piekarska^0.28421336 
> NAME:piekarskie^0.25392002
> >> NAME:piekarsky^0.29205337 NAME:piekarzcyk^0.23232001 
> NAME:piekarzki^0.29205337
> >> NAME:piekaski^0.24843001 NAME:piekavska^0.22533335 
> >> NAME:piekorski^0.28421336
> >> NAME:pielarski^0.22997928 NAME:pierarski^0.22997928 
> NAME:pierkarski^0.24661335
> >> NAME:piesarski^0.22997928 NAME:pietarski^0.22997928 
> NAME:pietkarski^0.24661335
> >> NAME:pikarski^0.23232001 NAME:piowarski^0.20281483 
> >> NAME:pirkarski^0.22073482
> >> NAME:plocharski^0.21168004 NAME:pokarski^0.20172001 
> NAME:polikarski^0.20172001
> >> NAME:pukarski^0.20172001 NAME:pyekarska^0.26508 
> NAME:siekarski^0.20281483))^2.0)
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > ----- Original Message ----
> >> >> From: Michael McCandless
> >> >> To: java-user@lucene.apache.org
> >> >> Sent: Wednesday, 15 July, 2009 17:16:23
> >> >> Subject: Re: speed of BooleanQueries on 2.9
> >> >>
> >> >> So now I'm confused.  Since your query has required (+) clauses, the
> >> >> setAllowDocsOutOfOrder should have no effect, on either 2.4 or trunk.
> >> >>
> >> >> BooleanQuery only uses BooleanScorer when there are no required terms,
> >> >> and allowDocsOutOfOrder is true.  So I can't explain why you see this
> >> >> setting changing anything on this query...
> >> >>
> >> >> Mike
> >> >>
> >> >> On Tue, Jul 14, 2009 at 7:04 PM, eks devwrote:
> >> >> >
> >> >> > I do not know exactly why, but
> >> >> > when I BooleanQuery.setAllowDocsOutOfOrder(true); I have the problem, 
> but
> >> with
> >> >> setAllowDocsOutOfOrder(false);  no problems whatsoever
> >> >> >
> >> >> > not really scientific method to find such bug, but does the job and 
> makes
> >> me
> >> >> happy.
> >> >> >
> >> >> > Empirical, "deprecated methods are not to be taken as thoroughly 
> >> >> > tested, 
> as
> >> >> they have short life expectancy"
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> > ----- Original Message ----
> >> >> >> From: eks dev
> >> >> >> To: java-user@lucene.apache.org
> >> >> >> Sent: Wednesday, 15 July, 2009 0:24:43
> >> >> >> Subject: Re: speed of BooleanQueries on 2.9
> >> >> >>
> >> >> >>
> >> >> >> Mike, we are definitely hitting something with this one!
> >> >> >>
> >> >> >> we had report from our QA chaps that our servers got stuck (limit is 
> >> >> >> on
> >> 180
> >> >> >> Seconds Request)... We are on average 14 Requsts per second.... has
> >> nothing
> >> >> to
> >> >> >> do with gc() as
> >> >> >> we can repeat it with freshly restarted searcher.
> >> >> >>
> >> >> >> - it happens on a less than 0.1% of queries, not much of a  pattern,
> >> >> repeatable
> >> >> >> on our index...
> >> >> >> it is always combination of two expanded tokens (we use
> >> >> >> minimumNooShouldMatch)...
> >> >> >>
> >> >> >> (+(t1 [up to 40 expansions]) +(t2 [up to 40 expansions of t2]))
> >> >> >> all tokens are with set boost, and  minNumShouldMatch is set to two
> >> >> >>
> >> >> >> I cannot provide self-contained test, nor index (contains sensitive 
> data
> >> and
> >> >> is
> >> >> >> rather big, ~5G)
> >> >> >>
> >> >> >> I can repeat this test on t1 and t2 with 40 expansions each. even if 
> >> >> >> I
> >> take
> >> >> the
> >> >> >> most frequent tokens in collection it runs well under one 
> >> >> >> second...but
> >> these
> >> >> two
> >> >> >> particular tokens with their "expansions" are making it run 
> >> >> >> forever...
> >> >> >>
> >> >> >> and yes, if I run t1 plus expansions only, it runs super fast, the 
> >> >> >> same
> >> for
> >> >> t2
> >> >> >>
> >> >> >> java 1.4U14, tried wit 1.6U6, no changes...
> >> >> >>
> >> >> >> will report if I dig something out
> >> >> >>
> >> >> >> partial stack trace while "stuck", cpu is on max:
> >> >> >>
> >> >> >>
> >> >>
> >> 
> org.apache.lucene.search.TopScoreDocCollector$OutOfOrderTopScoreDocCollector.collect(Unknown
> >> >> >> Source)
> >> >> >> org.apache.lucene.search.BooleanScorer.score(Unknown Source)
> >> >> >> org.apache.lucene.search.BooleanScorer.score(Unknown Source)
> >> >> >> org.apache.lucene.search.IndexSearcher.search(Unknown Source)
> >> >> >> org.apache.lucene.search.IndexSearcher.search(Unknown Source)
> >> >> >> org.apache.lucene.search.Searcher.search(Unknown Source)
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> ----- Original Message ----
> >> >> >> > From: eks dev
> >> >> >> > To: java-user@lucene.apache.org
> >> >> >> > Sent: Monday, 13 July, 2009 13:28:45
> >> >> >> > Subject: Re: speed of BooleanQueries on 2.9
> >> >> >> >
> >> >> >> > Hi Mike,
> >> >> >> >
> >> >> >> > getMaxNumOfCandidates() in test was 200, Index is optimised and
> >> read-only
> >> >> >> >
> >> >> >> > We found (due to an error in our warm-up code, funny) that only 
> >> >> >> > this
> >> Query
> >> >> >> runs
> >> >> >> > slower on 2.9.
> >> >> >> >
> >> >> >> > A hint where to look could be that this Query cointains two, the 
> >> >> >> > most
> >> >> frequent
> >> >> >>
> >> >> >> > tokens in two particular fields
> >> >> >> > NAME:hans and ZIPS:berlin (index has ca 80Mio very short 
> >> >> >> > documents, 
> 3Mio
> >> >> >> unique
> >> >> >> > terms)
> >> >> >> >
> >> >> >> > But all of this *could be just wrong measurement*, I just could not
> >> spend
> >> >> more
> >> >> >>
> >> >> >> > time to get to the bottom of this. We moved forward as we got 
> >> >> >> > overall
> >> >> better
> >> >> >> > average performance (sweet 10% in average) on much bigger real 
> >> >> >> > query 
> log
> >> >> from
> >> >> >> > our regression test.
> >> >> >> >
> >> >> >> > Anyhow I just wanted to throw it out, maybe it triggers some 
> >> >> >> > synapses 
> :)
> >> If
> >> >> >> > false alarm, sorry.
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> > ----- Original Message ----
> >> >> >> > > From: Michael McCandless
> >> >> >> > > To: java-user@lucene.apache.org
> >> >> >> > > Sent: Monday, 13 July, 2009 11:50:48
> >> >> >> > > Subject: Re: speed of BooleanQueries on 2.9
> >> >> >> > >
> >> >> >> > > This is not expected; 2.9 has had a number of changes that ought 
> >> >> >> > > to
> >> >> >> > > reduce CPU cost of searching.  If this holds up we definitely 
> >> >> >> > > need 
> to
> >> >> >> > > get to the root cause.
> >> >> >> > >
> >> >> >> > > Did your test exclude the warmup query for both 2.4.1 & 2.9?  
> >> >> >> > > How 
> many
> >> >> >> > > segments in the index?  What is the actual value of
> >> >> >> > > getMaxNumOfCandidates()?  If you simplify the query down (eg 
> >> >> >> > > just 
> do
> >> >> >> > > the NAME clause or the ZIPSS clause, alone) are those also 4X 
> slower?
> >> >> >> > >
> >> >> >> > > Mike
> >> >> >> > >
> >> >> >> > > On Sun, Jul 12, 2009 at 12:53 PM, eks devwrote:
> >> >> >> > > >
> >> >> >> > > > Is it possible that the same BooleanQuery on 2.9 runs 
> significantly
> >> >> slower
> >> >> >>
> >> >> >> > > than on 2.4?
> >> >> >> > > >
> >> >> >> > > > we have some strange effects where the following query runs 
> approx
> >> >> >> 4(ouch!)
> >> >> >> > > times slower on 2.9, test done by 1000 times executing the same
> >> Query...
> >> >> >> But!
> >> >> >> > if
> >> >> >> > > I run test from some real Query log with mixed Queries, I get 
> almost
> >> the
> >> >> >> same
> >> >> >> > > results (?!), even slightly faster on 2.9 !?
> >> >> >> > > >
> >> >> >> > > >
> >> >> >> > > > Query:
> >> >> >> > > > +((NAME:hans NAME:hahns^0.23232001 NAME:hams^0.27648002
> >> >> NAME:hamz^0.25392
> >> >> >> > > NAME:hanas^0.18722998 NAME:hanbs^0.18722998 NAME:hanfs^0.18722998
> >> >> >> > > NAME:hangs^0.18722998 NAME:hanhs^0.24030754 NAME:hanis^0.18722998
> >> >> >> > > NAME:hanjs^0.18722998 NAME:hanks^0.18722998 NAME:hanms^0.18722998
> >> >> >> > > NAME:hanos^0.18722998 NAME:hanrs^0.18722998 NAME:hansb^0.20172001
> >> >> >> > > NAME:hansd^0.20172001 NAME:hansf^0.20172001 NAME:hansg^0.20172001
> >> >> >> > > NAME:hansi^0.20172001 NAME:hansj^0.20172001 NAME:hansk^0.20172001
> >> >> >> > > NAME:hansl^0.20172001 NAME:hansn^0.20172001 NAME:hanso^0.20172001
> >> >> >> > > NAME:hansp^0.20172001 NAME:hanst^0.20172001 NAME:hansu^0.20172001
> >> >> >> > > NAME:hansw^0.20172001 NAME:hansy^0.20172001 NAME:hansz^0.20172001
> >> >> >> > > NAME:hants^0.18722998 NAME:hanus^0.18722998 NAME:hanws^0.18722998
> >> >> >> > > NAME:hehns^0.20172001 NAME:hens^0.2736075 NAME:hins^0.24843
> >> >> >> NAME:hons^0.24843
> >> >> >> > > NAME:huhns^0.1801875 NAME:huns^0.24843)^2.0)
> >> >> >> > > > +(((ZIPS:berlin ZIPS:barlin^0.28227 ZIPS:berien^0.25947002
> >> >> >> > > ZIPS:berling^0.23232001 ZIPS:perlin^0.26133335))^1.2)
> >> >> >> > > >
> >> >> >> > > > The question is just to get some hints where I should look...
> >> >> >> > > >
> >> >> >> > > > Both fealds are without norms, omitTf(true) , RAMDirectory, 
> >> >> >> > > > using
> >> >> >> > > > TopDocs top = ixSearcher.search(q, null, 
> getMaxNumOfCandidates());
> >> >> >> > > > and BooleanQuery.setAllowDocsOutOfOrder(true);
> >> >> >> > > >
> >> >> >> > > > maybe we made some mistakes on measuring, but we did simple 
> timing
> >> here
> >> >> on
> >> >> >>
> >> >> >> > > search() method... strange. I would bet it is something we did, 
> >> >> >> > > but 
> I
> >> >> cannot
> >> >> >>
> >> >> >> > see
> >> >> >> > > where ...
> >> >> >> > > >
> >> >> >> > > >
> >> >> >> > > >
> >> >> >> > > >
> >> >> >> > > >
> >> >> >> > > >
> >> ---------------------------------------------------------------------
> >> >> >> > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> >> >> > > > For additional commands, e-mail: 
> >> >> >> > > > java-user-h...@lucene.apache.org
> >> >> >> > > >
> >> >> >> > > >
> >> >> >> > >
> >> >> >> > > 
> ---------------------------------------------------------------------
> >> >> >> > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> >> >> > > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> ---------------------------------------------------------------------
> >> >> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> >> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> > ---------------------------------------------------------------------
> >> >> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> >> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >> >> >
> >> >> >
> >> >>
> >> >> ---------------------------------------------------------------------
> >> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >> >
> >> >
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: speed of BooleanQueries on 2.9

Reply via email to