On Fri, 4 May 2001, Don Gourley wrote:
> I've recently built and installed htdig-3.2.0b3 and it is
> working pretty well. However, it is indexing more docs
Try the "htstat" program in the 3.2.0b3 release, e.g.
htstat -u
> Also, is my assumption correct that if a document is
> excluded via exclude_urls then no links in it are followed
If a document is excluded via exclude_urls, it is not
downloaded. Period. So there's no way of knowing what the contents are,
much less following links.
If you want to follow links, but not index a document, you want the META
robots tag: <http://www.robotstxt.org/wc/meta-user.html>
> http://websource.wrlc.org:8000/voyager/stgfac/
> http://websource.wrlc.org:8000/voyager/stgfac/index.html
These are treated as identical via the remove_default_doc attribute:
<http://www.htdig.org/attrs.html#remove_default_doc>
> http://websource.wrlc.org:8000/voyager/stgfac/?N=D
> http://websource.wrlc.org:8000/voyager/stgfac/?D=A
Personally, I turn this off in Apache. Beyond that, I exclude these using
exclude_urls because it's not useful to index different versions of an
automatically-generated directory index.
--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html