Re: Index-time Boosting

Yonik Seeley Tue, 05 Dec 2006 08:54:53 -0800

On 12/5/06, Tracey Jaquith <[EMAIL PROTECTED]> wrote:

Quick intro.  Server Engineer at Internet Archive.
I just spent a mere 3 days porting nearly our entire site to use your
*wonderful* project!


I, too, am looking for a kind of "boosting".
If I understand your reply here, if i reindex *all* my documents with
   <field name="title" boost="100">i'm super, thanks for asking!</field>
and make sure that any subsequent incremental (re)indexing of documents
use that same extra ' boost="100" ' then I should be making the relevance
of the title in our documents 100x (or whatever that translates to)
"heavier"
than other non-title fields, correct?

I know this prolly isn't the relevant place to otherwise gush,
but THANK YOU for this fantastic (and maintained!) code
and we look forward to using this in the near future on our site!
Go opensource!


Welcome aboard!

From a "fresh" user perspective, what was your hardest or most

confusing part of starting to use Solr?

[We are most interested in always having "title", "description", and a
few other
 fields boosted.  We have both user queries of phrases/words as well as
 "field-specific" queries (eg: "mediatype:moves AND collection:prelinger")
 so my thought is std might be better than dismax.


Yes, for the example above you want the standard request handler
because you are searching for different things in different fields
rather than the same thing in different fields.

However, there are multiple ways of doing everything...
It looks like at least some of your clauses are restrictions rather
than full-text queries, and can be more efficiently modeled as
filters.  Since filters are cached separately, this can lead to a
large increase in performance.

So in either the standard or dismax handlers, you could do
q="foo bar"&fq=mediatype:movies&fq=collection:prelinger

 I've tried some experiments, adjusting the boosts at index time and running
 the std handler to see the ordering of the results change for
"fieldless queries"
 (eg: "q=tracey+pooh").  I have 33 fields using <copyField dest="text"
source="..."/>
  (where "text" is our default field to query)
 to allow for checking across most of our std XML fields.  I gather that
a boost
  applied to "title" on indexing a docuement must somehow "propogate" to the
  "text" field?


Background: for an indexed field name there is a single boost value
per document.  This is true even if the field is multi-valued... all
values for that document "share" the same boost.  This is a Lucene
restriction so we can't fix it in Solr in any way.

Solr *does* propagate the index-time boost when doing copyField, but
this just ends up being multiplied into all the other boosts for
values for that document.   Matches on the resulting text field will
*always* score higher, regardless of which "part" matched.  Does that
make sense?

Index time boosts can make sense if you want to boost the importance
of certain *documents*.  Query time boosts make more sense when you
want certain fields or certain search terms to count more than others.

So if you want to search across your general text field, while at the
same time boosting the title field, you could do:

q="foo bar" title:"foo bar"^10

Or you could search across all the fields individually, giving them
all different boosts:
q=subject:foo^3 title:foo^10 body:foo

The dismax handler has a different way of specifying fields to search
across and boosts:
q=foo&qf=subject^3,title^10,body

If you really want index-time boosts, there was a bug fix to
index-time field boosts on 11/3, so make sure you are using a later
version.

-Yonik

Re: Index-time Boosting

Reply via email to