On Thu, 2004-03-18 at 13:32, Doug Cutting wrote:Have you tried assigning these very small boosts (0 < boost < 1) and assigning other query clauses relatively large boosts (boost > 1)?
I was trying to formulate a query like, say +(title: asparagus) (doctype:bad)^-3
which would make sure the "bad" document was ranked lower than any other
value for doctype. But negative boosts are illegal.
I tried your suggestion of putting large boost on the first clause and a
small one (0.01) on the second, but the "bad" document is still ranked higher than the good one -- it gets a slight improvement from the
doctype:bad match, times 0.01, which is a very slight improvement but
still positive. Then it gets a big boost because it has a 1.0 rather
than a 0.5 coordination factor, so the bad item gets top billing.
I don't think you understood my proposal. You should try boosting the documents when you add them. Instead of adding a "doctype" field with "good" and "bad" values, use Document.setBoost(0.01) at index time.
Also, you could disable coordination if you like by defining your own Similarity class.
I think I've identified a few ways to solve the puzzle, though:
(a) enumerate all the possible "good" types of documents and search for them, rather than the single bad one. Harder to maintain since doctypes can be introduced, but possible.
That would indeed work better in an additive scoring system like this.
(b) attach boost values less than one to the "bad" Documents at indexing time. Not as flexible as modifying the query, but plausible.
Yes, that's what I proposed. You can reset boost values later now too.
(c) a more complex query like this: (title:asparagus) OR (title:asparagus -doctype:bad) so for good documents both clauses will match and the coordination factor will be in their favor. This increases query complexity (they aren't really simple one-term queries like this toy example), but hopefully that will not be a performance issue.
I think modifying the coordination function would be better. Note that, in the current CVS codebase, you can modify the Similarity implementation on a per-clause basis. So you could construct a query that had negative coordination, i.e., that gives lower scores when more clauses match. This could be done by subclassing BooleanQuery and overriding its getSimilarity(Searcher) method.
Doug
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
