According to Neil Kohl:
> Hi, all. I've run into a problem with robots.txt directives not being
> applied properly. All of our sites have robots.txt files that allow
> htdig full access (empty Disallow:), and which may or may not place
> restrictions on other robots. Here's htdig -v -v -v -v -v output from
> a site that has no restrictions:
>
> Parsing robots.txt file using myname = htdig
> Robots.txt line: # robots.txt for Environmental Assessment
> Robots.txt line: User-agent: htdig
> Found 'user-agent' line: htdig
> Robots.txt line: Disallow:
> Found 'disallow' line:
> Robots.txt line: # Rest of world:
> Robots.txt line: User-agent: *
> Found 'user-agent' line: *
> Robots.txt line: Disallow:
> Pattern:
> 1 - Closing previous connection with the remote host
> pushed
> Rejected: forbidden by server robots.txt!
> pick: eadev.acponline.org, # servers = 1
> > eadev.acponline.org supports HTTP persistent connections (infinite)
> ht://dig End Time: Thu Nov 13 09:52:02 2003
>
>
> htdig is coming across as user agent 'htdig':
>
> 172.19.31.12 - - [13/Nov/2003:09:40:24 -0500] "HEAD /robots.txt HTTP/1.1" 200 0 "-"
> "htdig"
> 172.19.31.12 - - [13/Nov/2003:09:40:24 -0500] "GET /robots.txt HTTP/1.1" 200 138 "-"
> "htdig"
>
>
> Removing the robots.txt file results in a normal run. Any ideas on
> what's causing this?
That would be a bug, which I unfortunately introduced while fixing another
one in this part of the code. See if this patch fixes the problem...
--- htdig/Server.cc.orig 2003-10-27 17:28:52.000000000 -0600
+++ htdig/Server.cc 2003-11-13 11:31:24.000000000 -0600
@@ -338,6 +338,8 @@
String fullpatt = "^[^:]*://[^/]*(";
fullpatt << pattern << ')';
+ if (pattern.length() == 0)
+ fullpatt = "";
_disallow.set(fullpatt, config->Boolean("case_sensitive"));
}
--
Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada)
-------------------------------------------------------
This SF.Net email sponsored by: ApacheCon 2003,
16-19 November in Las Vegas. Learn firsthand the latest
developments in Apache, PHP, Perl, XML, Java, MySQL,
WebDAV, and more! http://www.apachecon.com/
_______________________________________________
ht://Dig general mailing list: <[EMAIL PROTECTED]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general