Re: [htdig] robots.txt

Geoff Hutchison Wed, 14 Aug 2002 06:03:20 -0700


On Wednesday, August 14, 2002, at 06:01  AM, pp wrote:


> I got robots.txt like this:
>
> User-agent: *
> Disallow: /page
>
> This should disallow all robots index all pages within /page.
> Right?

> Besides I have defines robotstxt_name and htdig seems to

You don't need to set robotstxt_name to get htdig to recognize a 
robots.txt file. You'd have to change the code to get it to *ignore* a 
robots.txt file.

On the other hand, if you've done indexing and there are already URLs in 
your database and then you change robots.txt (or add one), at the 
moment, the existing URLs will not be thrown out of the database and 
they'll still be checked for updates.

If you're indexing from scratch and htdig seems to be ignoring the 
robots.txt file, it would help to have the output from running htdig 
-vvvv (which will show parsing of the robots.txt file at the beginning) 
and to know what version you're using and where you got it from (to make 
sure it's a bug, not someone's hacked version).

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/



-------------------------------------------------------
This sf.net email is sponsored by: Dice - The leading online job board
for high-tech professionals. Search and apply for tech jobs today!
http://seeker.dice.com/seeker.epl?rel_code=31
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Re: [htdig] robots.txt

Reply via email to