LWP::RobotUA won't parse a robots.txt file if the file does not contain
"Disallow".  The check for "Disallow" is case sensitive, but according to
the robot exclusion standard, field names are case insensitive.  This
causes LWP::RobotUA to ignore some robots.txt files that it should parse.

Attached is a patch that makes the check for "Disallow" case insensitive.  
The patch is against libwww-perl 5.76 (RobotUA.pm 1.23).

-- 
Liam Quinn


--- LWP/RobotUA.pm.orig 2003-10-24 07:13:03.000000000 -0400
+++ LWP/RobotUA.pm      2004-04-03 17:59:04.000000000 -0500
@@ -126,7 +126,7 @@
        my $fresh_until = $robot_res->fresh_until;
        if ($robot_res->is_success) {
            my $c = $robot_res->content;
-           if ($robot_res->content_type =~ m,^text/, && $c =~ /Disallow/) {
+           if ($robot_res->content_type =~ m,^text/, && $c =~ /Disallow/i) {
                LWP::Debug::debug("Parsing robot rules");
                $self->{'rules'}->parse($robot_url, $c, $fresh_until);
            }

Reply via email to