WWW::RobotRules attempts to trim the robot's User-Agent before comparing 
it with the User-agent field of a robots.txt file:

        # Strip it so that it's just the short name.
        # I.e., "FooBot"                                      => "FooBot"
        #       "FooBot/1.2"                                  => "FooBot"
        #       "FooBot/1.2 [http://foobot.int; [EMAIL PROTECTED]" => "FooBot"

        delete $self->{'loc'};   # all old info is now stale
        $name = $1 if $name =~ m/(\S+)/; # get first word
        $name =~ s!/?\s*\d+.\d+\s*$!!;  # loose version

My robot's name is "WDG_SiteValidator/1.5.5".  The above code changes the 
name to "WDG_SiteValidator/1.", which causes it not to match a robots.txt 
User-agent field of "WDG_SiteValidator".

I've attached a patch against WWW::RobotRules 1.23 that replaces the last 
line above with

        $name =~ s!/.*!!;  # loose version

which seems to cover the various cases correctly.

-- 
Liam Quinn

--- WWW/RobotRules.pm.orig      Sat Aug 17 23:32:07 2002
+++ WWW/RobotRules.pm   Thu Sep 11 20:55:39 2003
@@ -254,7 +254,7 @@
 
        delete $self->{'loc'};   # all old info is now stale
        $name = $1 if $name =~ m/(\S+)/; # get first word
-       $name =~ s!/?\s*\d+.\d+\s*$!!;  # loose version
+       $name =~ s!/.*!!;  # loose version
        $self->{'ua'}=$name;
     }
     $old;

Reply via email to