Gilles Detillieux wrote:

...snip...

> I think you misunderstood.  htdig already does read the robots.txt file
> and skips all disallowed documents.

Woops, my apologies for that gaff, my brain has started the holiday
season without me {:-).
Actually, I given up remembering how you do/I did anything under linux -
with versions every three months, it is all different everytime I look
at something. 

You are correct about that as I now remember having to look at this in
detail as my robots.txt excludes all the lists I archive on site from
indexing bots and htdig very obediently acted on this. I wanted htdig to
actually index the contents of these lists, but exclude everything else,
which it now does quite nicely.


> Actually, on my site I don't bother with exclude_urls at all, and use the
> robots.txt file instead.  This way, anything that I don't want indexed by
> htdig won't be indexed by any other search engine either.

I wish all search engines did obey robots text.

Thanks for the development effort with htdig. Very useful app.

--
   Terry Collins {:-)}}} Ph(02) 4627 2186 Fax(02) 4628 7861  
   email: [EMAIL PROTECTED]  www: http://www.woa.com.au  
   WOA Computer Services <lan/wan, linux/unix, novell>

 "People without trees are like fish without clean water"

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  <http://www.htdig.org/mail/menu.html>
FAQ:            <http://www.htdig.org/FAQ.html>

Reply via email to