According to Malcolm Austen:
> On Fri, 6 Jul 2001, Gilles Detillieux wrote:
> + > http://wwwsearch.ox.ac.uk/scores.html
> +
> + That's a great writeup, and I agree with your recommendations.  We may want
> + to consider changing the defaults in upcoming releases.
> 
> Thanks for that Gilles. The people who were so keen to know how to ensure
> their pages got a good score (don't they all 8-) never did offer to debate
> the settings. As you think my suggestions are reasonable, I've just put
> them in for tonights reindex. I'll report to the list if anything goes
> dramatically wrong!

Well, I'm no expert on how the weighting factors should be set, as I've
never experimented with settings other than the defaults.  So, whether
I think your suggestions are resonable may not count for much.  :-)

The proof is in the pudding.  As I haven't heard back from you about this,
I'd guess that nothing went wrong.  Did you notice an improvement in
rankings, though?

> + I do want to correct one inaccuracy in the document, though.  You say:
> <snip>
> + It actually doesn't tail off one per word, but rather the factor of 0-1000
> + indicates tens of percentages from the end of the document, so it doesn't
> + actually hit zero except maybe for the last word or so of large documents.
> 
> Thanks, I evidently misunderstood when the matter came up on the list. I
> have just changed my wording to a paraphrase of yours.
> 
> + I would tend to agree that this isn't a great idea, but I'm not sure I
> + should take that out for 3.1.6.  Maybe another config attribute?
> 
> That sounds a solution. I guess you don't want to get too complicated but
> perhaps it could be an integer pair giving the weighting for the first and
> last word in the file with a linear ramp between them. He who wants to
> bias towards the later words in the document is welcome to do so!
> 
> Could you allow a single integer for a flat weighting? If the default is
> '1000 1' then any existing weighting effects will carry forward until the
> option is invoked.

Sounds reasonable, but we don't currently have "number list" handling
for attributes, so maybe this would be too much bother to implement.
I could always use StringList and then atoi() the numbers, I guess.
However, I think all we're really concerned about is controlling or
disabling the tapering off, so the initial value of 1000 can remaing
fixed.  The overall scores for words are still regulated by text_factor,
so maybe all we need is a last_word_factor or something like that.
However, as there are still a lot of scoring problems in 3.1 that are
handled better in 3.2, I don't know how much effort I want to put into
kludging this up for a maintenance release.  I'm open to suggestions
from anyone, though.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to