Asking about the "exact scoring formula" is a bit strange. In short, 
htsearch in the 3.2 code scores on the fly, adding up the weight of all 
occurrences of a word in a document. (So if a word is considered a 
heading, it gets the weight of the heading_factor variable.) This is 
then added to any ratings from the date_factor and backlink_factor and 
other URL-based weightings which are turned off by default.

I hope that answers your question and I apologize for not writing 
sooner, but you may want to see the FAQ about e-mailing people directly, 
specifically:
<http://www.htdig.org/FAQ.html#q1.16>
<http://www.htdig.org/FAQ.html#q1.4>

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

On Sunday, February 3, 2002, at 05:59  AM, T.Srikanth wrote:

>
> Hi,
>       I am doing a project in which an efficient search is to be
>       implemented on educational material (PDF, PS, DOC, PPT).
>
>       I am using htdig as the search engine. (ht://Dig 3.2.0b3)
>
>       I used external parsers (pdftohtml, antiword) to
>       convert the above formats to text and while doing this,
>       I am storing the font information as well. The idea
>       is to use this font information to achieve better search.
>       So I used the heading option of the external parsers
>       to assign weight to a word.
>       But this does not seem to work well.
>
>       Can you give me the exact scoring formula that is used
>       by htsearch, so that I can improve the performance.
>
>       Thanking you in anticipation.
>
> Srikanth.
>
>


_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to