LWP::RobotUA doesn't bother to parse a robots.txt file if the file does
not contain "Disallow".  The check for "Disallow" is case sensitive, but
according to the robot exclusion standard, field names are case
insensitive.  This causes LWP::RobotUA to ignore some robots.txt files
that it should parse.

Attached is a patch that makes the check for "Disallow" case insensitive.  
The patch is against RobotUA.pm 1.19.

-- 
Liam Quinn

--- LWP/RobotUA.pm.orig Tue Sep  3 00:05:04 2002
+++ LWP/RobotUA.pm      Thu Sep 11 20:33:28 2003
@@ -218,7 +218,7 @@
        my $fresh_until = $robot_res->fresh_until;
        if ($robot_res->is_success) {
            my $c = $robot_res->content;
-           if ($robot_res->content_type =~ m,^text/, && $c =~ /Disallow/) {
+           if ($robot_res->content_type =~ m,^text/, && $c =~ /Disallow/i) {
                LWP::Debug::debug("Parsing robot rules");
                $self->{'rules'}->parse($robot_url, $c, $fresh_until);
            }

Reply via email to