At 6:58 PM -0500 3/18/00, [EMAIL PROTECTED] wrote:
>Looking at documentation, it does not appear that there is any option in
>either the conf file or the parameters passed to htsearch, to limit the
>number of matches which are located and sorted.  If "several thousand"
>documents match the specified words, all of these have to participate in
>sorting; there's no way to limit the number which participate.

This has been requested in the past. The biggest problem is that it's 
a bit of a chicken-and-egg problem. You want to cut out the documents 
before scoring and sorting (preferably before even looking them up in 
the document DB). But before you have a ranking, you don't know which 
ones you want to cut exactly. After all, you don't want to cut out 
the best-ranked documents!

>Appears to me that I could inspect the .wordlist file produced by htdig,
>locate the records which are resulting in unwanted matches, and remove these
>prior to running htmerge.

Yes, you can do this. Another good technique is to use the cut and 
sort command-line programs to count the frequency of the words and 
add overused ones to the bad_words list. One reason for doing this is 
that very common words add very little information value to a query.

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.

Reply via email to