On Wednesday, August 14, 2002, at 06:01 AM, pp wrote:
> I got robots.txt like this: > > User-agent: * > Disallow: /page > > This should disallow all robots index all pages within /page. > Right? > Besides I have defines robotstxt_name and htdig seems to You don't need to set robotstxt_name to get htdig to recognize a robots.txt file. You'd have to change the code to get it to *ignore* a robots.txt file. On the other hand, if you've done indexing and there are already URLs in your database and then you change robots.txt (or add one), at the moment, the existing URLs will not be thrown out of the database and they'll still be checked for updates. If you're indexing from scratch and htdig seems to be ignoring the robots.txt file, it would help to have the output from running htdig -vvvv (which will show parsing of the robots.txt file at the beginning) and to know what version you're using and where you got it from (to make sure it's a bug, not someone's hacked version). -- -Geoff Hutchison Williams Students Online http://wso.williams.edu/ ------------------------------------------------------- This sf.net email is sponsored by: Dice - The leading online job board for high-tech professionals. Search and apply for tech jobs today! http://seeker.dice.com/seeker.epl?rel_code=31 _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

