Thanks Dave,

Having boost seamingly absent from the explain calculation confused me,
but your explanation of field_norm helps a lot. 

Neville

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of David Balmain
Sent: Wednesday, 20 September 2006 8:22 PM
To: [email protected]
Subject: Re: [Ferret-talk] Understanding boost ?

On 9/20/06, Neville Burnell <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I'm confused about managing field boosting ...
>
> I have set the :boost for the :name field in my docs to 10, via :boost

> => 10
>
> Then I performed a search for 'keith' over all fields via with 
> *:(keith*), expecting a doc with Keith in the :name field to come out 
> on top. But another doc with Keith mentioned in other fields 
> (:comments,
> :address) scored higher.
>
> I viewed the explanation from the searcher, but it wasn't clear to me 
> why the boost wasn't pushing the :name = Keith document to the top.
>
> Any help on understanding field boosting and explain would be great.
>
> Regards
>
> Neville
>
> PS, the two explains are:
>
> Doc1:
> 0.3352959 = product of:
>   8.047102 = sum of:
>     4.011141 = weight(comments:<keith|[EMAIL PROTECTED]|keithex> in 
> 4697), product of:
>       0.5685414 =
> query_weight(comments:<keith|[EMAIL PROTECTED]|keithex>), product of:
>         28.22057 = idf(comments:<(keithex=1) + ([EMAIL PROTECTED]) +
> (keith=115) = 117>)
>         0.02014635 = query_norm
>       7.055143 = 
> field_weight(comments:<keith|[EMAIL PROTECTED]|keithex>
> in 4697), product of:
>         1.0 = The sum of:
>           1.0 = tf(term_freq(comments:keithex)=1)^1.0
>         28.22057 = idf(comments:<(keithex=1) + ([EMAIL PROTECTED]) +
> (keith=115) = 117>)
>         0.25 = field_norm(field=comments, doc=4697)
>     4.03596 = weight(address:<keith|keithex> in 4697), product of:
>       0.4032613 = query_weight(address:<keith|keithex>), product of:
>         20.0166 = idf(address:<(keithex=1) + (keith=8) = 9>)
>         0.02014635 = query_norm
>       10.0083 = field_weight(address:<keith|keithex> in 4697), product
> of:
>         1.0 = The sum of:
>           1.0 = tf(term_freq(address:keithex)=1)^1.0
>         20.0166 = idf(address:<(keithex=1) + (keith=8) = 9>)
>         0.5 = field_norm(field=address, doc=4697)
>   0.04166667 = coord(2/48)
>
>
> Doc2:
> 0.2977623 = product of:
>   14.29259 = weight(name:<keith> in 31416), product of:
>     0.2028171 = query_weight(name:<keith>), product of:
>       10.06719 = idf(name:<(keith=3) = 3>)
>       0.02014635 = query_norm
>     70.47034 = field_weight(name:<keith> in 31416), product of:
>       1.0 = The sum of:
>         1.0 = tf(term_freq(name:keith)=1)^1.0
>       10.06719 = idf(name:<(keith=3) = 3>)
>       7.0 = field_norm(field=name, doc=31416)
>   0.02083333 = coord(1/48)

Hi Neville,

The field's boost value affects the field_norm value in the Explanations
above. Here is how it is calculated:

    field_norm = field_info->boost * doc->boost * field->boost *
                (1 / sqrt(field->num_terms)

So as you can see from the Explanations above, field_norm is 7.0 on the
boosted field which is more than 10 times the field_norms on the other
two fields (0.25, 0.5) so at least you can see the boost is having an
effect. The address field probably has a higher field_norm value than
the comments field because the comments field is longer (see that last
part of the field_norm equation). Note that the reason the boost is 7.0
and not 10.0 is that the field_norm gets stored in a single byte so
there is quite a large loss of precision.

Having said all this, there does seem to be a problem with the
calculations. I don't think I've calculated the idf value correctly for
MultiTermQueries. I've rectified this in subversion so the next version
should give your results in an order that you'd expect.

For information on tf and idf, check out this page:

    http://en.wikipedia.org/wiki/Tf-idf

Hope that helps. I'd love to give a better explanation of the scoring
but I don't have time right now.

Cheers,
Dave
_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk
_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

Reply via email to