Hi Peter,
On 11/06/2008 at 4:25 PM, Peter Keegan wrote:
> I've discovered another flaw in using this technique:
>
> (+contents:petroleum +contents:engineer +contents:refinery)
> (+boost:petroleum +boost:engineer +boost:refinery)
>
> It's possible that the first clause will produce a matching
> doc and none of the terms in the second clause are used to
> score that doc. Yet another reason to use BoostingTermQuery.
I think you could address this, without BTQ, using something like:
boost:(+petroleum +engineer +refinery)
(+contents:(+petroleum +engineer +refinery)
+((*:* -boost:petroleum)
(*:* -boost:engineer)
(*:* -boost:refinery)))
The last three lines gives you the set of documents that are missing at least
one of the terms in the "boost" field. The *:* thingy, indicating a
MatchAllDocsQuery, is necessary to get all documents that don't have a given
term; Lucene's (sub-)query document exclusion operation needs a non-empty set
on which to operate.
On 11/06/2008 at 1:08 PM, Peter Keegan wrote:
> Then, at search time, a query for "petroleum engineer" gets rewritten
> to: (+contents:petroleum +contents:engineer) (+boost:petroleum
> +boost:engineer). Note that the two clauses are OR'd so that a term that
> exists in both fields will get a higher weight in the 'boost' field.
> This works quite well at boosting documents with terms that exist in the
> boosted fields. However, it doesn't work properly if excluded terms are
> added, for example:
>
> (+contents:petroleum +contents:engineer -contents:drilling)
> (+boost:petroleum +boost:engineer -boost:drilling)
>
> If a document contains the term 'drilling' in the 'body'
> field, but not in the 'title' or 'city' field, a false hit occurs.
I think you could address this problem like this:
+(boost:(+petroleum +engineer)
(+contents:(+petroleum +engineer)
+((*:* -boost:petroleum)
(*:* -boost:engineer))))
-contents:drilling
You don't have to include "-boost:drilling", because this condition is entailed
by "-contents:drilling".
Steve
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]