WWW::RobotRules attempts to trim the robot's User-Agent before comparing it with the User-agent field of a robots.txt file:
# Strip it so that it's just the short name. # I.e., "FooBot" => "FooBot" # "FooBot/1.2" => "FooBot" # "FooBot/1.2 [http://foobot.int; [EMAIL PROTECTED]" => "FooBot" delete $self->{'loc'}; # all old info is now stale $name = $1 if $name =~ m/(\S+)/; # get first word $name =~ s!/?\s*\d+.\d+\s*$!!; # loose version My robot's name is "WDG_SiteValidator/1.5.5". The above code changes the name to "WDG_SiteValidator/1.", which causes it not to match a robots.txt User-agent field of "WDG_SiteValidator". I've attached a patch against WWW::RobotRules 1.23 that replaces the last line above with $name =~ s!/.*!!; # loose version which seems to cover the various cases correctly. -- Liam Quinn
--- WWW/RobotRules.pm.orig Sat Aug 17 23:32:07 2002 +++ WWW/RobotRules.pm Thu Sep 11 20:55:39 2003 @@ -254,7 +254,7 @@ delete $self->{'loc'}; # all old info is now stale $name = $1 if $name =~ m/(\S+)/; # get first word - $name =~ s!/?\s*\d+.\d+\s*$!!; # loose version + $name =~ s!/.*!!; # loose version $self->{'ua'}=$name; } $old;