Asking about the "exact scoring formula" is a bit strange. In short, htsearch in the 3.2 code scores on the fly, adding up the weight of all occurrences of a word in a document. (So if a word is considered a heading, it gets the weight of the heading_factor variable.) This is then added to any ratings from the date_factor and backlink_factor and other URL-based weightings which are turned off by default.
I hope that answers your question and I apologize for not writing sooner, but you may want to see the FAQ about e-mailing people directly, specifically: <http://www.htdig.org/FAQ.html#q1.16> <http://www.htdig.org/FAQ.html#q1.4> -- -Geoff Hutchison Williams Students Online http://wso.williams.edu/ On Sunday, February 3, 2002, at 05:59 AM, T.Srikanth wrote: > > Hi, > I am doing a project in which an efficient search is to be > implemented on educational material (PDF, PS, DOC, PPT). > > I am using htdig as the search engine. (ht://Dig 3.2.0b3) > > I used external parsers (pdftohtml, antiword) to > convert the above formats to text and while doing this, > I am storing the font information as well. The idea > is to use this font information to achieve better search. > So I used the heading option of the external parsers > to assign weight to a word. > But this does not seem to work well. > > Can you give me the exact scoring formula that is used > by htsearch, so that I can improve the performance. > > Thanking you in anticipation. > > Srikanth. > > _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

