You are in trouble if you use MultiTermQuery subclasses as negative clause in a BooleanQuery, e.g a range like "-[A TO B]" or even NumericRanges or Wildcards. The query will then incorrect results.
----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -----Original Message----- > From: Ganesh [mailto:emailg...@yahoo.co.in] > Sent: Thursday, November 25, 2010 9:55 AM > To: java-user@lucene.apache.org > Subject: Re: best practice: 1.4 billions documents > > Thanks for the input. > > My results are sorted by date and i am not much bothered about score. Will i > still be in trouble? > > Regards > Ganesh > > > ----- Original Message ----- > From: "Robert Muir" <rcm...@gmail.com> > To: <java-user@lucene.apache.org> > Sent: Thursday, November 25, 2010 1:45 PM > Subject: Re: best practice: 1.4 billions documents > > > On Thu, Nov 25, 2010 at 2:58 AM, Uwe Schindler <u...@thetaphi.de> wrote: > > ParallelMultiSearcher as subclass of MultiSearcher has the same problems. > These are not crashes, but more that some queries do not return correct scored > results for some queries. This effects especially all MultiTermQueries > (TermRange, Fuzzy, NumericRange, Wildcard, Prefix) if they are used in a > negative fashion (using MUST_NOT resp. "-" in QueryParser). For all of those > queries except Fuzzy, you are safe if you use > CONSTANT_SCORE_REWRITE_METHOD (using setRewriteMethod). The same > problems apply for span queries. For *all* Fuzzy Queries (negative or not), > the > scores are simply wrong and so scoring is broken with (Parallel)MultiSearcher; > wrong results are only returned when negative clauses! > > > > you can use constant score rewrite method with fuzzy, too. then it > will work "correctly" (even negative) with multisearcher too. but it > will be slow, with unbounded number of results, and the fuzziness will > not affect the scoring. (this is what constant score rewrite implies) > > the reason i say "correctly" is that for all of these queries, > constant score rewrite is just a general workaround, and might still > be incorrect. This is because many queries often have special cases > where they rewrite to simpler things and in general the MultiSearcher > combine() logic is broken here, so there might be more problems. > > > A new class ParallelIndexSearcher could help with that, when it parallelizes > multiple segments, this is still in planning phase. The difference to > ParallelMultiSearcher would be that it takes a "single" IndexReader (e.g. a > MultiReader in your case) and parallelizes per segment/segment bunches. > > > > Besides the inherited broken-ness from multisearcher, > parallelmultisearcher is broken further because it requires you to > organize your index structure in a special way to get concurrency. > > This is all pretty silly though, since ParallelMultiSearcher on a > single machine isn't going to increase QPS, so how useful really is it > in general??? > > we should deprecate both the broken Multi & ParallelMulti Searchers > and never look back. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > Send free SMS to your Friends on Mobile from your Yahoo! Messenger. > Download Now! http://messenger.yahoo.com/download.php > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org