According to Neil Kohl:
> Hi, all. I've run into a problem with robots.txt directives not being
> applied properly. All of our sites have robots.txt files that allow
> htdig full access (empty Disallow:), and which may or may not place
> restrictions on other robots. Here's htdig -v -v -v -v -v output from
> a site that has no restrictions:
> 
> Parsing robots.txt file using myname = htdig
> Robots.txt line: # robots.txt for Environmental Assessment
> Robots.txt line: User-agent: htdig
> Found 'user-agent' line: htdig
> Robots.txt line: Disallow:
> Found 'disallow' line: 
> Robots.txt line: # Rest of world:
> Robots.txt line: User-agent: *
> Found 'user-agent' line: *
> Robots.txt line: Disallow:
> Pattern: 
>     1 - Closing previous connection with the remote host
>  pushed
>    Rejected: forbidden by server robots.txt!
> pick: eadev.acponline.org, # servers = 1
> > eadev.acponline.org supports HTTP persistent connections (infinite)
> ht://dig End Time: Thu Nov 13 09:52:02 2003
> 
> 
> htdig is coming across as user agent 'htdig': 
> 
> 172.19.31.12 - - [13/Nov/2003:09:40:24 -0500] "HEAD /robots.txt HTTP/1.1" 200 0 "-" 
> "htdig"
> 172.19.31.12 - - [13/Nov/2003:09:40:24 -0500] "GET /robots.txt HTTP/1.1" 200 138 "-" 
> "htdig"
>  
> 
> Removing the robots.txt file results in a normal run. Any ideas on
> what's causing this?

That would be a bug, which I unfortunately introduced while fixing another
one in this part of the code.  See if this patch fixes the problem...

--- htdig/Server.cc.orig        2003-10-27 17:28:52.000000000 -0600
+++ htdig/Server.cc     2003-11-13 11:31:24.000000000 -0600
@@ -338,6 +338,8 @@
                
     String     fullpatt = "^[^:]*://[^/]*(";
     fullpatt << pattern << ')';
+    if (pattern.length() == 0)
+       fullpatt = "";
     _disallow.set(fullpatt, config->Boolean("case_sensitive"));
 }
 

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)


-------------------------------------------------------
This SF.Net email sponsored by: ApacheCon 2003,
16-19 November in Las Vegas. Learn firsthand the latest
developments in Apache, PHP, Perl, XML, Java, MySQL,
WebDAV, and more! http://www.apachecon.com/
_______________________________________________
ht://Dig general mailing list: <[EMAIL PROTECTED]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to