RE: Boolean Scorer

Chuck Williams Mon, 13 Dec 2004 13:16:26 -0800

Daniel,

The test case is now attached as Bug #32674.  It's commented with lines
from the email below to make the correspondence easy.  Please let me
know your thoughts,


Chuck

  > -----Original Message-----
  > From: Chuck Williams [mailto:[EMAIL PROTECTED]
  > Sent: Sunday, December 12, 2004 11:23 AM
  > To: Lucene Developers List
  > Subject: RE: Boolean Scorer
  > 
  > Daniel,
  > 
  > A perfectly reasonable request -- I'll put together a simple test
case
  > but can't do it today.
  > 
  > The problem is with scoring -- nothing to do with and queries.
  > 
  > The test will run along these lines:
  >   1.  Use a custom similarity to eliminate all tf and idf effects,
just
  > to isolate what is being tested.
  >   2.  Create two documents doc1 and doc2, each with two fields title
and
  > description.  doc1 has "elephant" in title and "elephant" in
  > description.  doc2 has "elephant" in title and "albino" in
description.
  >   3.  Express query for "albino elephant" against both fields.
  > Problems:
  >       a.  MultiFieldQueryParser won't recognize either document as
  > containing both terms, due to the way it expands the query across
  > fields.
  >       b.  Expressing query as "title:albino description:albino
  > title:elephant description:elephant" will score both documents
  > equivalently, since each matches two query terms.
  >   4.  Comparison to MaxDisjunctionQuery and my method for expanding
  > queries across fields.  Using notation that () represents a
BooleanQuery
  > and {} represents a MaxDisjunctionQuery, "albino elephant" expands
to:
  >         ( {title:albino description:albino}
  >           {title:elephant description:elephant} )
  > This will recognize that doc2 has both terms matched while doc1 only
has
  > 1 term matched, score doc2 over doc1.
  > 
  > Refinement note:  the actual expansion for "albino query" that I use
is:
  >         ( {title:albino description:albino}~0.1
  >           {title:elephant description:elephant}~0.1 )
  > This causes the score of each MaxDisjunctionQuery to be the score of
  > highest scoring MDQ subclause plus 0.1 times the sum of the scores
of
  > the other MDQ subclauses.  Thus, doc1 gets some credit for also
having
  > "elephant" in the description but only 1/10 as much as doc2 gets for
  > covering another query term in its description.  If doc3 has
"elephant"
  > in title and both "albino" and "elephant" in the description, then
with
  > the actual refined expansion, it gets the highest score of all
(whereas
  > with pure max, without the 0.1, it would get the same score as
doc2).
  > 
  > In real apps, tf's and idf's also come into play of course, but can
  > affect these either way (i.e., mitigate this fundamental problem or
  > exacerbate it).
  > 
  > Chuck
  > 
  >   > -----Original Message-----
  >   > From: Daniel Naber [mailto:[EMAIL PROTECTED]
  >   > Sent: Sunday, December 12, 2004 2:24 AM
  >   > To: Lucene Developers List
  >   > Subject: Re: Boolean Scorer
  >   >
  >   > On Sunday 12 December 2004 04:01, Chuck Williams wrote:
  >   >
  >   > > I maintain the belief that max is *required* to implement
  > reasonable
  >   > > multi-field searching (1).
  >   >
  >   > Could you give a small example -- preferably a test case -- that
  > shows
  >   > what
  >   > the problem is? I know it has been discussed before but I hadn't
  > been
  >   > able
  >   > to follow that discussion closely enough. I assume the problem
is in
  > the
  >   > scoring, not in MultiFieldQueryParser. MultiFieldQueryParser has
a
  >   > different problem, namely that it doesn't correctly work with
AND
  >   > queries.
  >   > Or is that the issue you're talking about? Anyway, that will be
  > fixed
  >   > soon.
  >   >
  >   > Regards
  >   >  Daniel
  >   >
  >   > --
  >   > http://www.danielnaber.de
  >   >
  >   >
  >
---------------------------------------------------------------------
  >   > To unsubscribe, e-mail:
[EMAIL PROTECTED]
  >   > For additional commands, e-mail:
[EMAIL PROTECTED]
  > 
  > 
  >
---------------------------------------------------------------------
  > To unsubscribe, e-mail: [EMAIL PROTECTED]
  > For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Boolean Scorer

Reply via email to