Try out: http://issues.apache.org/jira/browse/LUCENE-850
If this is useful to you, be sure to add a comment to the issue.
-Mike
On 3-Jul-07, at 10:51 AM, Tim Sturge wrote:
I'm following myself up here to ask if anyone has experience or
code with a BooleanQuery that weights the terms it encounters on a
product basis rather than a sum basis.
This would effectively compute the geometric mean of the term score
(rather than the arithmetic mean) and would give me more "middle
bias". It also has the great advantage that it automatically
implements AND (as something without the term has a score of 0.0
which causes the query to go to 0.0 as well.)
I'm curious though why this doesn't already exist. Is it a bad idea
in general (that I will discover once I implement it and look at
the results?) or does it make searching a lot slower?
Thanks,
Tim
Tim Sturge wrote:
I have an index with two different sources of information, one
small but of high quality (call it "title"), and one large, but of
lower quality (call it "body"). I give boosts to certain
documents related to their popularity (this is very similar to
what one would do indexing the web).
The problem I have is a query like "John Bush". I translate that
into " (title:John^4.0 body:John) AND (title:Bush^4.0 body:Bush)
". But the results I get are:
1. George Bush
...
4. John Kerry
...
10. John Bush
The reason is (looking at explain) that George Bush is scored:
169 = sum(
1 = <match in body with tiny norm for "John">
)
168 = sum(
160 = <title match for "Bush">
8 = <body match for "Bush">
)
)
and John Kerry is similar but reversed. Poor old "John Bush" only
scores:
72 = sum(
40 = (<title match for "John">+<body match>)
32 = (<title match for "Bush">+ <body match>)
)
because his initial boost was only 1/4 of George's.
The question I have is, how can tell the searcher to care about
"balance"? I really want the score over 2 terms to be more like
(sqrt(X)+sqrt(Y))^2 or maybe even exp(log(X)+log(Y)) rather than
just X+Y. Is that supported in some obvious way, or is there some
other way to phrase my query to say "I want both terms but they
should both be important if possible?"
Thanks,
Tim
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]