Bill Moseley <[EMAIL PROTECTED]> writes:

> I've got a spider that uses LWP::RobotUA (WWW::RobotRules) and a few
> users of the spider have complained that the warning messages were
> not obvious enough.  I guess I can agree because when they are
> spidering multiple hosts the message doesn't tell them what robots.txt
> had a problem.

The patch I've now applied is this one:

Index: lib/WWW/RobotRules.pm
===================================================================
RCS file: /cvsroot/libwww-perl/lwp5/lib/WWW/RobotRules.pm,v
retrieving revision 1.31
retrieving revision 1.32
diff -u -p -u -r1.31 -r1.32
--- lib/WWW/RobotRules.pm       12 Nov 2004 16:05:09 -0000      1.31
+++ lib/WWW/RobotRules.pm       12 Nov 2004 16:14:25 -0000      1.32
@@ -1,8 +1,8 @@
 package WWW::RobotRules;

-# $Id: RobotRules.pm,v 1.31 2004/11/12 16:05:09 gisle Exp $
+# $Id: RobotRules.pm,v 1.32 2004/11/12 16:14:25 gisle Exp $

-$VERSION = sprintf("%d.%02d", q$Revision: 1.31 $ =~ /(\d+)\.(\d+)/);
+$VERSION = sprintf("%d.%02d", q$Revision: 1.32 $ =~ /(\d+)\.(\d+)/);
 sub Version { $VERSION; }

 use strict;
@@ -70,7 +70,7 @@ sub parse {
        }
        elsif (/^\s*Disallow\s*:\s*(.*)/i) {
            unless (defined $ua) {
-               warn "RobotRules: Disallow without preceding User-agent\n";
+               warn "RobotRules <$robot_txt_uri>: Disallow without preceding 
User-agent\n" if $^W;
                $is_anon = 1;  # assume that User-agent: * was intended
            }
            my $disallow = $1;
@@ -97,7 +97,7 @@ sub parse {
            }
        }
        else {
-           warn "RobotRules: Unexpected line: $_\n";
+           warn "RobotRules <$robot_txt_uri>: Unexpected line: $_\n" if $^W;
        }
     }

> So maybe something like:
> 
> --- RobotRules.pm.old   2004-04-09 08:37:08.000000000 -0700
> +++ RobotRules.pm       2004-09-16 09:46:03.000000000 -0700
> @@ -70,7 +70,7 @@
>         }
>         elsif (/^\s*Disallow\s*:\s*(.*)/i) {
>             unless (defined $ua) {
> -               warn "RobotRules: Disallow without preceding User-agent\n";
> +               warn "RobotRules: [$robot_txt_uri] Disallow without preceding 
> User-agent\n";
>                 $is_anon = 1;  # assume that User-agent: * was intended
>             }
>             my $disallow = $1;
> @@ -97,7 +97,7 @@
>             }
>         }
>         else {
> -           warn "RobotRules: Unexpected line: $_\n";
> +           warn "RobotRules: [$robot_txt_uri] Unexpected line: $_\n";
>         }
>      }

Reply via email to