Bill Moseley <[EMAIL PROTECTED]> writes: > I've got a spider that uses LWP::RobotUA (WWW::RobotRules) and a few > users of the spider have complained that the warning messages were > not obvious enough. I guess I can agree because when they are > spidering multiple hosts the message doesn't tell them what robots.txt > had a problem.
The patch I've now applied is this one: Index: lib/WWW/RobotRules.pm =================================================================== RCS file: /cvsroot/libwww-perl/lwp5/lib/WWW/RobotRules.pm,v retrieving revision 1.31 retrieving revision 1.32 diff -u -p -u -r1.31 -r1.32 --- lib/WWW/RobotRules.pm 12 Nov 2004 16:05:09 -0000 1.31 +++ lib/WWW/RobotRules.pm 12 Nov 2004 16:14:25 -0000 1.32 @@ -1,8 +1,8 @@ package WWW::RobotRules; -# $Id: RobotRules.pm,v 1.31 2004/11/12 16:05:09 gisle Exp $ +# $Id: RobotRules.pm,v 1.32 2004/11/12 16:14:25 gisle Exp $ -$VERSION = sprintf("%d.%02d", q$Revision: 1.31 $ =~ /(\d+)\.(\d+)/); +$VERSION = sprintf("%d.%02d", q$Revision: 1.32 $ =~ /(\d+)\.(\d+)/); sub Version { $VERSION; } use strict; @@ -70,7 +70,7 @@ sub parse { } elsif (/^\s*Disallow\s*:\s*(.*)/i) { unless (defined $ua) { - warn "RobotRules: Disallow without preceding User-agent\n"; + warn "RobotRules <$robot_txt_uri>: Disallow without preceding User-agent\n" if $^W; $is_anon = 1; # assume that User-agent: * was intended } my $disallow = $1; @@ -97,7 +97,7 @@ sub parse { } } else { - warn "RobotRules: Unexpected line: $_\n"; + warn "RobotRules <$robot_txt_uri>: Unexpected line: $_\n" if $^W; } } > So maybe something like: > > --- RobotRules.pm.old 2004-04-09 08:37:08.000000000 -0700 > +++ RobotRules.pm 2004-09-16 09:46:03.000000000 -0700 > @@ -70,7 +70,7 @@ > } > elsif (/^\s*Disallow\s*:\s*(.*)/i) { > unless (defined $ua) { > - warn "RobotRules: Disallow without preceding User-agent\n"; > + warn "RobotRules: [$robot_txt_uri] Disallow without preceding > User-agent\n"; > $is_anon = 1; # assume that User-agent: * was intended > } > my $disallow = $1; > @@ -97,7 +97,7 @@ > } > } > else { > - warn "RobotRules: Unexpected line: $_\n"; > + warn "RobotRules: [$robot_txt_uri] Unexpected line: $_\n"; > } > }