The current behaviour of LWP::RobotUA, when passed in an existing WWW::RobotRules::InCore object is counterintuitive to me.
I am of this opinion because of the documentation of $rules in LWP::RobotUA->new() and WWW::RobotRules->agent(), as well as the implementation in WWW::RobotRules::AnyDBM_File. Currently, W::R::InCore empties the cache always when agent() is called, regardless if the agent name changed or not. W::R::AnyDBM_File does not seem to have this problem. I suggest applying the attached patch to fix this. Additionally, I see InCore and AnyDBM_File use a different algorithm for getting the "short" agent name from the full one, with the AnyDBM_File looking "older". Perhaps add a new method/function for this (eg. short_agent()) in WWW::RobotRules that could be used in both InCore and AnyDBM_File? While on the robots subject, applying something like the "warning could be more helpful" change from http://www.xray.mpe.mpg.de/mailing-lists/libwww-perl/2004-08/msg00024.html would be most welcome.
Index: lib/WWW/RobotRules.pm =================================================================== RCS file: /cvsroot/libwww-perl/lwp5/lib/WWW/RobotRules.pm,v retrieving revision 1.30 diff -a -u -r1.30 RobotRules.pm --- lib/WWW/RobotRules.pm 9 Apr 2004 15:09:14 -0000 1.30 +++ lib/WWW/RobotRules.pm 12 Oct 2004 06:39:34 -0000 @@ -185,10 +185,12 @@ # "FooBot/1.2" => "FooBot" # "FooBot/1.2 [http://foobot.int; [EMAIL PROTECTED]" => "FooBot" - delete $self->{'loc'}; # all old info is now stale $name = $1 if $name =~ m/(\S+)/; # get first word $name =~ s!/.*!!; # get rid of version - $self->{'ua'}=$name; + unless ($old && $old eq $name) { + delete $self->{'loc'}; # all old info is now stale + $self->{'ua'} = $name; + } } $old; }