On Tue, 22 May 2001, Gilles Detillieux wrote:
> > For example, the pages which were the top matches
> > with version 3.1.5 are way down on the list with 3.2.0b3,
> > even though the <TITLE> of those pages contain the
> > keywords being searched for!
>
> Well, I'm not aware of changes in 3.2 that would alter relative weights
> of different types of words that drastically.
I am. Basically I haven't changed the weighting much at all, though some
drastic changes in the weighting formula really need to be done. (For
example, right now, no version of ht://Dig weights words by word
frequency--you'd like to essentially ignore common words and favor rare
words.)
However, in the 3.1 code and before, the words at the top of a document
get a significantly higher weight than words at the end of a
document. This is not true in 3.2--words are rated equally throughout the
document. See below.
> You might want to try tweaking your title_factor, as well as other
> *_factor attributes, to
> see if you can get rankings that are more to your liking.
These need to be altered to get better weighting. However, I'd like to
wait until we can put in such things as a proximity_factor or the
"multiboost" feature that gives higher weight to documents containing all
words in an "any" query. This would save us from tuning and then retuning.
On the plus side, tinkering with the *_factor attributes can be done
without reindexing in 3.2--just tweak and check. So it's a *lot* easier to
do. If you find factors that work well for you, please let us know.
--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html