Re: [htdig] Going for the big dig

Gilles Detillieux Tue, 19 Dec 2000 08:29:44 -0800
According to Terry Collins:
> Geoff Hutchison wrote:
> > 
> > At 10:14 AM +1100 12/19/00, Terry Collins wrote:
> > >And make sure you don't ignore robots.txt
> > 
> > Yes, though someone would need to alter the code to do this.
> 
> If you are doing an external site, it shouldn't be to much effort to
> just read this and set the excludes.
> 
> Courtesy thing.

I think you misunderstood.  htdig already does read the robots.txt file
and skips all disallowed documents.  You don't need to do this manually.
Geoff was saying you'd need to alter the code in order to ignore robots.txt,
which definitely would be a bad thing if you then use the hacked htdig to
index sites that are not your own.

Actually, on my site I don't bother with exclude_urls at all, and use the
robots.txt file instead.  This way, anything that I don't want indexed by
htdig won't be indexed by any other search engine either.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  <http://www.htdig.org/mail/menu.html>
FAQ:            <http://www.htdig.org/FAQ.html>
Re: [htdig] Going for the big dig

Reply via email to