David Melton wrote:
> exclude_urls seems to make it entirely ignore the file. In the
> case of the index files, this means that it won't traverse the
> links to get to the message files. So, if I put "threads" and
> "maillist" in exclude_urls, I wind up with nothing at all!
What would be better is to add a <meta name="robots"
content="noindex,follow"> tag to the index files. This instructs
ht://Dig and other compliant indexers to follow links on the page, but
not to include it in the search index.
> The description of description_factor is "Plain old "descriptions"
> are the text of a link pointing to a document. This factor gives
> weight to the words of these descriptions of the document."
> In the case of MHonArc index files, the link descriptions are
> the subject fields of the messages, which are exactly what I
> want it to exclude from any searches.
Descriptions are link text indexed as part of the document itself. So
this has nothing to do with the message index--it's adding that text to
the messages themselves. To exclude the indexes, you'll have to do
something like I mentioned above.
The <!--htdig_noindex--> tags will work OK, but the indices will still
be included in the document database. Using the <meta> robots tag will
make sure it's not actually included.
--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word unsubscribe in
the SUBJECT of the message.