RE: A question about scoring function in Lucene

Chuck Williams Wed, 15 Dec 2004 21:26:04 -0800

I'll try to address all the comments here.

The normalization I proposed a while back on lucene-dev is specified.
Its properties can be analyzed, so there is no reason to guess about
them.

Re. Hoss's example and analysis, yes, I believe it can be demonstrated
that the proposed normalization would make certain absolute statements
like x and y meaningful.  However, it is not a panacea -- there would be
some limitations in these statements.

To see what could be said meaningfully, it is necessary to recall a
couple detailed aspects of the proposal:
  1.  The normalization would not change the ranking order or the ratios
among scores in a single result set from what they are now.  Only two
things change:  the query normalization constant, and the ad hoc final
normalization in Hits is eliminated because the scores are intrinsically
between 0 and 1.  Another way to look at this is that the sole purpose
of the normalization is to set the score of the highest-scoring result.
Once this score is set, all the other scores are determined since the
ratios of their scores to that of the top-scoring result do not change
from today.  Put simply, Hoss's explanation is correct.
  2.  There are multiple ways to normalize and achieve property 1.  One
simple approach is to set the top score based on the boost-weighted
percentage of query terms it matches (assuming, for simplicity, the
query is an OR-type BooleanQuery).  So if all boosts are the same, the
top score is the percentage of query terms matched.  If there are
boosts, then these cause the terms to have a corresponding relative
importance in the determination of this percentage.

More complex normalization schemes would go further and allow the tf's
and/or idf's to play a role in the determination of the top score -- I
didn't specify details here and am not sure how good a thing that would
be to do.  So, for now, let's just consider the properties of the simple
boost-weighted-query-term percentage normalization.

Hoss's example could be interpreted as single-term phrases "Doug
Cutting" and "Chris Hostetter", or as two-term BooleanQuery's.
Considering both of these cases illustrates the absolute-statement
properties and limitations of the proposed normalization.

If single-term PhraseQuery's, then the top score will always be 1.0
assuming the phrase matches (while the other results have arbitrary
fractional scores based on the tfidf ratios as today).  If the queries
are BooleanQuery's with no boosts, then the top score would be 1.0 or
0.5 depending on whether 1 or two terms were matched.  This is
meaningful.

In Lucene today, the top score is not meaningful.  It will always be 1.0
if the highest intrinsic score is >= 1.0.  I believe this could happen,
for example, in a two-term BooleanQuery that matches only one term (if
the tf on the matched document for that term is high enough).

So, to be concrete, a score of 1.0 with the proposed normalization
scheme would mean that all query terms are matched, while today a score
of 1.0 doesn't really tell you anything.  Certain absolute statements
can therefore be made with the new scheme.  This makes the
absolute-threshold monitored search application possible, along with the
segregating and filtering applications I've previously mentioned (call
out good results and filter out bad results by using absolute
thresholds).

These analyses are simplified by using only BooleanQuery's, but I
believe the properties carry over generally.

Doug also asked about research results.  I don't know of published
research on this topic, but I can again repeat an experience from
InQuira.  We found that end users benefited from a search experience
where good results were called out and bad results were downplayed or
filtered out.  And we managed to achieve this with absolute thresholding
through careful normalization (of a much more complex scoring
mechanism).  To get a better intuitive feel for this, think about you
react to a search where all the results suck, but there is no visual
indication of this that is any different from a search that returns
great results.

Otis raised the patch I submitted for MultiSearcher.  This addresses a
related problem, in that the current MultiSearcher does not rank results
equivalently to a single unified index -- specifically it fails Daniel
Naber's test case.  However, this is just a simple bug whose fix doesn't
require the new normalization.  I submitted a patch to fix that bug,
along with a caveat that I'm not sure the patch is complete, or even
consistent with the intentions of the author of this mechanism.

I'm glad to see this topic is generating some interest, and apologize if
anything I've said comes across as overly abrasive.  I use and really
like Lucene.  I put a lot of focus on creating a great experience for
the end user, and so am perhaps more concerned about quality of results
and certain UI aspects than most other users.

Chuck

  > -----Original Message-----
  > From: Doug Cutting [mailto:[EMAIL PROTECTED]
  > Sent: Wednesday, December 15, 2004 12:35 PM
  > To: Lucene Users List
  > Subject: Re: A question about scoring function in Lucene
  > 
  > Chris Hostetter wrote:
  > > For example, using the current scoring equation, if i do a search
for
  > > "Doug Cutting" and the results/scores i get back are...
  > >       1:   0.9
  > >       2:   0.3
  > >       3:   0.21
  > >       4:   0.21
  > >       5:   0.1
  > > ...then there are at least two meaningful pieces of data I can
glean:
  > >    a) document #1 is significantly better then the other results
  > >    b) document #3 and #4 are both equaly relevant to "Doug
Cutting"
  > >
  > > If I then do a search for "Chris Hostetter" and get back the
following
  > > results/scores...
  > >       9:   0.9
  > >       8:   0.3
  > >       7:   0.21
  > >       6:   0.21
  > >       5:   0.1
  > >
  > > ...then I can assume the same corrisponding information is true
about
  > my
  > > new search term (#9 is significantly better, and #7/#8 are equally
as
  > good)
  > >
  > > However, I *cannot* say either of the following:
  > >   x) document #9 is as relevant for "Chris Hostetter" as document
#1
  > is
  > >      relevant to "Doug Cutting"
  > >   y) document #5 is equally relevant to both "Chris Hostetter" and
  > >      "Doug Cutting"
  > 
  > That's right.  Thanks for the nice description of the issue.
  > 
  > > I think the OP is arguing that if the scoring algorithm was
modified
  > in
  > > the way they suggested, then you would be able to make statements
x &
  > y.
  > 
  > And I am not convinced that, with the changes Chuck describes, one
can
  > be any more confident of x and y.
  > 
  > Doug
  > 
  >
---------------------------------------------------------------------
  > To unsubscribe, e-mail: [EMAIL PROTECTED]
  > For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: A question about scoring function in Lucene

Reply via email to