Re: [htdig] what docs are indexed?

Geoff Hutchison Fri, 04 May 2001 08:44:46 -0700
On Fri, 4 May 2001, Don Gourley wrote:

> I've recently built and installed htdig-3.2.0b3 and it is
> working pretty well.  However, it is indexing more docs

Try the "htstat" program in the 3.2.0b3 release, e.g.
 htstat -u

> Also, is my assumption correct that if a document is
> excluded via exclude_urls then no links in it are followed

If a document is excluded via exclude_urls, it is not
downloaded. Period. So there's no way of knowing what the contents are,
much less following links.

If you want to follow links, but not index a document, you want the META
robots tag: <http://www.robotstxt.org/wc/meta-user.html>

> http://websource.wrlc.org:8000/voyager/stgfac/
> http://websource.wrlc.org:8000/voyager/stgfac/index.html

These are treated as identical via the remove_default_doc attribute:
<http://www.htdig.org/attrs.html#remove_default_doc>

> http://websource.wrlc.org:8000/voyager/stgfac/?N=D
> http://websource.wrlc.org:8000/voyager/stgfac/?D=A

Personally, I turn this off in Apache. Beyond that, I exclude these using
exclude_urls because it's not useful to index different versions of an
automatically-generated directory index.

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/


_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html
Re: [htdig] what docs are indexed?

Reply via email to