On Wed, 29 Mar 2000, Toivo Pedaste wrote:
> I'm using a version of HTDIG from a couple of days ago and the scoring
> isn't working properly. I noticed that the result ordering is very
> strange and it turned out to be because the score was dominated by
> the backlink factor.
Yeah, I really need to tinker with the default *_factors. I've been
messing with the way documents are scored and it's throwing things off.
The biggest change was dividing individual word scores by the global word
frequency. This is why word scores dropped dramatically! I'd like to
multiply by the document word frequency but right now that's not possible
to calculate AFAIK.
(Dividing by word freq. makes common words less important in a multi-word
query, while multiplying by the document word freq. makes words that
appear a lot in a document very important.)
I think something that would really help is to decide on some sort of test
corpus. Previously I used *.htdig.org. Right now this numbers around
14,000 URLs, mostly e-mail messages. Does this seem a reasonable test? If
so, we can try setting _factors based on that.
It would probably help if people started tinkering themselves. Remember
that in 3.2, you can set them on-the-fly, which will make this exercise
much easier. I'd start by setting backlink_factor to 0.1 for now.
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.