The current behaviour of LWP::RobotUA, when passed in an existing
WWW::RobotRules::InCore object is counterintuitive to me.

I am of this opinion because of the documentation of $rules in
LWP::RobotUA->new() and WWW::RobotRules->agent(), as well as the
implementation in WWW::RobotRules::AnyDBM_File.

Currently, W::R::InCore empties the cache always when agent() is called,
regardless if the agent name changed or not.  W::R::AnyDBM_File does not
seem to have this problem.

I suggest applying the attached patch to fix this.

Additionally, I see InCore and AnyDBM_File use a different algorithm for
getting the "short" agent name from the full one, with the AnyDBM_File
looking "older".  Perhaps add a new method/function for this (eg.
short_agent()) in WWW::RobotRules that could be used in both InCore and
AnyDBM_File?

While on the robots subject, applying something like the "warning could
be more helpful" change from
http://www.xray.mpe.mpg.de/mailing-lists/libwww-perl/2004-08/msg00024.html would be 
most welcome.
Index: lib/WWW/RobotRules.pm
===================================================================
RCS file: /cvsroot/libwww-perl/lwp5/lib/WWW/RobotRules.pm,v
retrieving revision 1.30
diff -a -u -r1.30 RobotRules.pm
--- lib/WWW/RobotRules.pm	9 Apr 2004 15:09:14 -0000	1.30
+++ lib/WWW/RobotRules.pm	12 Oct 2004 06:39:34 -0000
@@ -185,10 +185,12 @@
         #       "FooBot/1.2"                                  => "FooBot"
         #       "FooBot/1.2 [http://foobot.int; [EMAIL PROTECTED]" => "FooBot"
 
-	delete $self->{'loc'};   # all old info is now stale
 	$name = $1 if $name =~ m/(\S+)/; # get first word
 	$name =~ s!/.*!!;  # get rid of version
-	$self->{'ua'}=$name;
+	unless ($old && $old eq $name) {
+	    delete $self->{'loc'}; # all old info is now stale
+	    $self->{'ua'} = $name;
+	}
     }
     $old;
 }

Reply via email to