boost:(+petroleum +engineer +refinery) (+contents:(+petroleum +engineer +refinery) +((*:* -boost:petroleum) (*:* -boost:engineer) (*:* -boost:refinery)))
That's an interesting solution. Would this result in many more documents being visited by the scorer, possibly impacting performance? (I haven't tried it yet). Thanks, Peter On Thu, Nov 6, 2008 at 6:56 PM, Steven A Rowe <[EMAIL PROTECTED]> wrote: > Hi Peter, > > On 11/06/2008 at 4:25 PM, Peter Keegan wrote: > > I've discovered another flaw in using this technique: > > > > (+contents:petroleum +contents:engineer +contents:refinery) > > (+boost:petroleum +boost:engineer +boost:refinery) > > > > It's possible that the first clause will produce a matching > > doc and none of the terms in the second clause are used to > > score that doc. Yet another reason to use BoostingTermQuery. > > I think you could address this, without BTQ, using something like: > > boost:(+petroleum +engineer +refinery) > (+contents:(+petroleum +engineer +refinery) > +((*:* -boost:petroleum) > (*:* -boost:engineer) > (*:* -boost:refinery))) > > The last three lines gives you the set of documents that are missing at > least one of the terms in the "boost" field. The *:* thingy, indicating a > MatchAllDocsQuery, is necessary to get all documents that don't have a given > term; Lucene's (sub-)query document exclusion operation needs a non-empty > set on which to operate. > > On 11/06/2008 at 1:08 PM, Peter Keegan wrote: > > Then, at search time, a query for "petroleum engineer" gets rewritten > > to: (+contents:petroleum +contents:engineer) (+boost:petroleum > > +boost:engineer). Note that the two clauses are OR'd so that a term that > > exists in both fields will get a higher weight in the 'boost' field. > > This works quite well at boosting documents with terms that exist in the > > boosted fields. However, it doesn't work properly if excluded terms are > > added, for example: > > > > (+contents:petroleum +contents:engineer -contents:drilling) > > (+boost:petroleum +boost:engineer -boost:drilling) > > > > If a document contains the term 'drilling' in the 'body' > > field, but not in the 'title' or 'city' field, a false hit occurs. > > I think you could address this problem like this: > > +(boost:(+petroleum +engineer) > (+contents:(+petroleum +engineer) > +((*:* -boost:petroleum) > (*:* -boost:engineer)))) > -contents:drilling > > You don't have to include "-boost:drilling", because this condition is > entailed by "-contents:drilling". > > Steve > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >