Hi, all. I've run into a problem with robots.txt directives not being applied 
properly. All of our sites have robots.txt files that allow htdig full access (empty 
Disallow:), and which may or may not place restrictions on other robots. Here's htdig 
-v -v -v -v -v output from a site that has no restrictions:

Parsing robots.txt file using myname = htdig
Robots.txt line: # robots.txt for Environmental Assessment
Robots.txt line: User-agent: htdig
Found 'user-agent' line: htdig
Robots.txt line: Disallow:
Found 'disallow' line: 
Robots.txt line: # Rest of world:
Robots.txt line: User-agent: *
Found 'user-agent' line: *
Robots.txt line: Disallow:
Pattern: 
    1 - Closing previous connection with the remote host
 pushed
   Rejected: forbidden by server robots.txt!
pick: eadev.acponline.org, # servers = 1
> eadev.acponline.org supports HTTP persistent connections (infinite)
ht://dig End Time: Thu Nov 13 09:52:02 2003


htdig is coming across as user agent 'htdig': 

172.19.31.12 - - [13/Nov/2003:09:40:24 -0500] "HEAD /robots.txt HTTP/1.1" 200 0 "-" 
"htdig"
172.19.31.12 - - [13/Nov/2003:09:40:24 -0500] "GET /robots.txt HTTP/1.1" 200 138 "-" 
"htdig"
 

Removing the robots.txt file results in a normal run. Any ideas on what's causing this?


Neil Kohl
Manager, ACP Online              
American College of Physicians
[EMAIL PROTECTED]              215.351.2638, 800.523.1546 x2638




-------------------------------------------------------
This SF.Net email sponsored by: ApacheCon 2003,
16-19 November in Las Vegas. Learn firsthand the latest
developments in Apache, PHP, Perl, XML, Java, MySQL,
WebDAV, and more! http://www.apachecon.com/
_______________________________________________
ht://Dig general mailing list: <[EMAIL PROTECTED]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to