On 9/27/06, Colin Cc <[EMAIL PROTECTED]> wrote:
> Lucene, and perhaps most search engines, are biased towards small fields
> with little content (where thus the term frequency is higher). Lucene
> has the option to define a custom (Similarity) class to calculate the
> similarity between two fields (custom calculation of lengthNorm and tf)
> in different documents. But how do I do this in ferret? (I know to boost
> a field, but this is not what I (think to) need, I need to be able to
> influence the relative importance between the same field)
>
Hi Colin,
Ferret uses the same similarity scoring as Lucene. Scoring is based
more on the ratio of number of matches to the length of the field,
rather than just the length of the field. So a small field with a
single match will score higher than a large field with a single match.
But a large field with many matches may still score more highly than a
small field with a single match.
The Similarity class is still unavailable in the Ruby API and it isn't
high on my list of priorities to write the bindings for it (unless
someone was willing to compensate me). However, I don't think you need
it for what you are describing. Boosts should do the job perfectly. If
you want to make the :title field more important than the :content
field then you set the boost of the :title FieldInfo, probably like
this:
fi = FieldInfos.new
fi.add_field(:title, :boost => 10.0)
But I think you want to make the same field more important in
different documents. So you can set the boost of the field when you
add it. You can either set the boost for the whole document:
doc = Ferret::Document.new(20.0)
doc[:title] = "Braveheart"
doc[:actors] = ["Mel Gibson", "Sophie Marceau"]
This will affect all fields in the document. Or you can set the boost
of the field directly.
doc = {
:title => Field.new("Legally Blonde", 0.02),
:actors => Field.new(["Reese Witherspoon", "Luke Wilson"], 2.0)
}
Hope that helps,
Cheers,
Dave
_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk