LWP::RobotUA doesn't bother to parse a robots.txt file if the file does not contain "Disallow". The check for "Disallow" is case sensitive, but according to the robot exclusion standard, field names are case insensitive. This causes LWP::RobotUA to ignore some robots.txt files that it should parse.
Attached is a patch that makes the check for "Disallow" case insensitive. The patch is against RobotUA.pm 1.19. -- Liam Quinn
--- LWP/RobotUA.pm.orig Tue Sep 3 00:05:04 2002 +++ LWP/RobotUA.pm Thu Sep 11 20:33:28 2003 @@ -218,7 +218,7 @@ my $fresh_until = $robot_res->fresh_until; if ($robot_res->is_success) { my $c = $robot_res->content; - if ($robot_res->content_type =~ m,^text/, && $c =~ /Disallow/) { + if ($robot_res->content_type =~ m,^text/, && $c =~ /Disallow/i) { LWP::Debug::debug("Parsing robot rules"); $self->{'rules'}->parse($robot_url, $c, $fresh_until); }