I'm a bit perplexed over whether the current Perl library WWW::RobotRules implements a certain part of the Robots Exclusion Standard correctly. So forgive me if this seems a simple question, but my reading of the Robots Exclusion Standard hasn't really cleared it up in my mind yet.
Basically the current WWW::RobotRules logic is this: As a WWW:::RobotRules object is parsing the lines in the robots.txt file, if it sees a line that says "User-Agent: ...foo...", it extracts the foo, and if the name of the current user-agent is a substring of "...foo...", then it considers this line as applying to it. So if the agent being modeled is called "Banjo", and the robots.txt line being parsed says "User-Agent: Thing, Woozle, Banjo, Stuff", then the library says "OK, 'Banjo' is a substring in 'Thing, Woozle, Banjo, Stuff', so this rule is talking to me!" However, the substring matching currently goes only one way. So if the user-agent object is called "Banjo/1.1 [http://nowhere.int/banjo.html [EMAIL PROTECTED]]" and the robots.txt line being parsed says "User-Agent: Thing, Woozle, Banjo, Stuff", then the library says "'Banjo/1.1 [http://nowhere.int/banjo.html [EMAIL PROTECTED]]' is NOT a substring of 'Thing, Woozle, Banjo, Stuff', so this rule is NOT talking to me!" I have the feeling that that's not right -- notably because that means that every robot ID string has to appear in toto on the "User-Agent" robots.txt line, which is clearly a bad thing. But before I submit a patch, I'm tempted to ask... what /is/ the proper behavior? Maybe shave the current user-agent's name at the first slash or space (getting just "Banjo"), and then seeing if /that/ is a substring of a given robots.txt "User-Agent:" line? -- Sean M. Burke [EMAIL PROTECTED] http://www.spinn.net/~sburke/ -- This message was sent by the Internet robots and spiders discussion list ([EMAIL PROTECTED]). For list server commands, send "help" in the body of a message to "[EMAIL PROTECTED]".