Gilles Detillieux wrote:
...snip...
> I think you misunderstood. htdig already does read the robots.txt file
> and skips all disallowed documents.
Woops, my apologies for that gaff, my brain has started the holiday
season without me {:-).
Actually, I given up remembering how you do/I did anything under linux -
with versions every three months, it is all different everytime I look
at something.
You are correct about that as I now remember having to look at this in
detail as my robots.txt excludes all the lists I archive on site from
indexing bots and htdig very obediently acted on this. I wanted htdig to
actually index the contents of these lists, but exclude everything else,
which it now does quite nicely.
> Actually, on my site I don't bother with exclude_urls at all, and use the
> robots.txt file instead. This way, anything that I don't want indexed by
> htdig won't be indexed by any other search engine either.
I wish all search engines did obey robots text.
Thanks for the development effort with htdig. Very useful app.
--
Terry Collins {:-)}}} Ph(02) 4627 2186 Fax(02) 4628 7861
email: [EMAIL PROTECTED] www: http://www.woa.com.au
WOA Computer Services <lan/wan, linux/unix, novell>
"People without trees are like fish without clean water"
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives: <http://www.htdig.org/mail/menu.html>
FAQ: <http://www.htdig.org/FAQ.html>