Ranking with MultiSearcher -- WAS RE: Normalized Scoring -- was RE: idf and explain(), was Re: Search and Scoring

Chuck Williams Thu, 21 Oct 2004 17:37:36 -0700

A simple solution occurred to me and I couldn't resist trying it.  The attached files 
fix Daniel's test which now returns the same scores in both cases, and it doesn't 
break my app.  I doubt this is the best fix -- see issues below.  These are modified 
1.4.2 source files.

The changes are:
  1.  Searcher:  Add topmostSearcher() field with getter and setter to record the 
outermost Searcher.  Default to this.
  2.  MultiSearcher:  Pass down the topmostSearcher when creating the subsearchers.
  3.  IndexSearcher:  Call Query.weight() everywhere with the topmostSearcher instead 
of this.
  4.  Query:  Provide a default implementation of Query.combine() so that 
MultiSearcher works with all queries.

Problems or possible problems I see:
  1.  This does not address the same issue with RemoteSearchable.  RemoteSearchable is 
not a Searcher, nor can it be due to lack of multiple inheritance in Java, but 
Query.weight() requires a Searcher.  Perhaps Query.weight() should be changed to take 
a Searchable, but this requires changing many places and I suspect would break apps.
  2.  There may be other places that topmostSearcher should be used instead of this.
  3.  The default implementation for Query.combine() is a guess on my part - it works 
for TermQuery.  It's fragile in that the default implementation will hide bugs caused 
by queries that inadvertently omit a more precise Query.combine() method.
  4.  The prior comment on Query.combine() indicates that whoever wrote it was fully 
aware of this problem and so probably had another usage in mind, so the whole issue 
may just be Daniel's usage in the test case.  It's not apparent to me, so I probably 
don't understand something.

Chuck

  > -----Original Message-----
  > From: Chuck Williams
  > Sent: Thursday, October 21, 2004 3:11 PM
  > To: 'Lucene Developers List'
  > Subject: RE: Normalized Scoring -- was RE: idf and explain(), was Re:
  > Search and Scoring
  > 
  > The idf's are indeed computed locally, but I believe it is a simple bug
  > in MultiSearcher.  The attached version of the test adds explain()'s to
  > verify the problem is the idf's (and changes the Field construction to
  > something that works in my 1.4.2 sources).
  > 
  > MultiSearcher.search() calls the separate searchers for each index.
  > That makes the IndexSearcher the current searcher when Similarity.idf()
  > is reached.  Thus IndexSearcher.docFreq() is used instead of
  > MultiSearcher.docFreq(), yielding the index-local idf's.
  > 
  > The best fix is not obvious to me, but it is just a code-structure issue.
  > 
  > Chuck
  > 
  >   > -----Original Message-----
  >   > From: Daniel Naber [mailto:[EMAIL PROTECTED]
  >   > Sent: Thursday, October 21, 2004 2:35 PM
  >   > To: Lucene Developers List
  >   > Subject: Re: Normalized Scoring -- was RE: idf and explain(), was
  > Re:
  >   > Search and Scoring
  >   >
  >   > On Thursday 21 October 2004 23:03, Doug Cutting wrote:
  >   >
  >   > > Idf's are already computed globally across all indexes.  Tf's are
  >   > local
  >   > > to the document.  In short, scores from a MultiSearcher are the
  > same
  >   > as
  >   > > when searching an IndexReader with the same documents.
  >   >
  >   > That doesn't seem to be the case in the attached test -- am I using
  >   > MultiSearcher in the wrong way or what might be the problem?
  >   > The output of the attached test is:
  >   >
  >   > 1+2 searched with Multisearcher:
  >   > two blah three score=0.70273256
  >   > one blah three score=0.35615897
  >   > one foo three score=0.35615897
  >   > one foobar three score=0.35615897
  >   >
  >   > 1+2 indexed together:
  >   > one blah three score=0.5911608
  >   > one foo three score=0.5911608
  >   > one foobar three score=0.5911608
  >   > two blah three score=0.5911608
  >   >
  >   > --
  >   > http://www.danielnaber.de

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Ranking with MultiSearcher -- WAS RE: Normalized Scoring -- was RE: idf and explain(), was Re: Search and Scoring

Reply via email to