I've got a spider that uses LWP::RobotUA (WWW::RobotRules) and a few
users of the spider have complained that the warning messages were
not obvious enough.  I guess I can agree because when they are
spidering multiple hosts the message doesn't tell them what robots.txt
had a problem.

So maybe something like:

--- RobotRules.pm.old   2004-04-09 08:37:08.000000000 -0700
+++ RobotRules.pm       2004-09-16 09:46:03.000000000 -0700
@@ -70,7 +70,7 @@
        }
        elsif (/^\s*Disallow\s*:\s*(.*)/i) {
            unless (defined $ua) {
-               warn "RobotRules: Disallow without preceding User-agent\n";
+               warn "RobotRules: [$robot_txt_uri] Disallow without preceding 
User-agent\n";
                $is_anon = 1;  # assume that User-agent: * was intended
            }
            my $disallow = $1;
@@ -97,7 +97,7 @@
            }
        }
        else {
-           warn "RobotRules: Unexpected line: $_\n";
+           warn "RobotRules: [$robot_txt_uri] Unexpected line: $_\n";
        }
     }
 




-- 
Bill Moseley
[EMAIL PROTECTED]

Reply via email to