Hi Yonik, Can we do the same for Lucene, the problem is combining the rewritten queries using the broken method in Query?
As far as I know, the problem is that e.g. MTQs rewrite *per searcher* so each searcher uses a different rewritten query (with different terms). So the scores are totally different even with a tf-idf patch (Fuzzy scores on MultiSearcher and Solr are totally wrong because each shard uses another rewritten query). To work around that, the Query class has a broken, broken, broken, broken, broken method to combine queries, which violates DeMorgans laws when there are e.g. negative clauses. And this method cannot be fixed to work with all queries (https://issues.apache.org/jira/browse/LUCENE-2756). ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -----Original Message----- > From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik > Seeley > Sent: Monday, November 22, 2010 6:29 PM > To: java-user@lucene.apache.org > Subject: Re: best practice: 1.4 billions documents > > On Mon, Nov 22, 2010 at 12:17 PM, Uwe Schindler <u...@thetaphi.de> wrote: > > The latest discussion was more about MultiReader vs. MultiSearcher. > > > > But you are right, 1.4 B documents is not easy to go, especially when > > you index grows and you get to the 2.1 B marker, then no MultiSearcher > > or whatever helps. > > > > On the other hand even distributed Solr has the same problems like > > MultiSearcher: scoring MultiTermQueries (Fuzzy) doesn't work correctly > > Are you referring to the idf being local to the shard instead of global to the > whole colleciton? > Andrzej has a patch in the works, but it's not committed yet. > > > negative MTQ clauses may produce wrong results if the query rewriting > > is done like in MultiSearcher (which is unsolveable broken and broken > > and broken and again broken for some queries as Boolean clauses - see > > DeMorgan laws). > > I don't think this is a problem for Solr. Queries are executed on each shard as > normal (no difference from a non-distributed query). > > -Yonik > http://www.lucidimagination.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org