Re: Disallows in robots.txt

2000-03-24 Thread Martin Beet

 So:

 I was looking at a robots.txt file and it had a series of disallow
 instructions for various user agents, and then at the bottom was a full
 disallow:
[...]
 Wouldn't this just disallow everyone from everything?



No, it would disallow everyone but a ... d (with the
specified restrictions).

From the spec:
  The robot
   must obey the first record in /robots.txt that contains a User-
   Agent line whose value contains the name token of the robot as a
   substring. The name comparisons are case-insensitive. If no such
   record exists, it should obey the first record with a User-agent
   line with a * value, if present. If no record satisfied either
   condition, or no records are present at all, access is unlimited.

Regards, Martin




Re: Stemming and Wildcards in robots.txt files

2000-03-13 Thread Martin Beet

 Jonathan Knoll:
  User-agent: *
  Disallow: /cgi-bin
  Disallow: /site

 Klaus Johannes Rusch:
  /cgi-bin/test.cgi
  /siteindex.html
  would be excluded.

 But what about these paths (in the same root dir):

/foo/cgi-bin/test.cgi
/bar/user1/cgi-bin/test.sgi
/bar/user2/cgi-bin/test.cgi


 Does the wildcard function recognize specified strings elsewhere (later)
 than in the immediate beginning of a path?


The draft specification is quite clear on this: the strings are compared
octet by octet until the Allow / Disallow string ends, in which case this
rule matches, or until a mismatch is found. From the spec:


 The matching process compares every octet in the path portion of
   the URL and the path from the record. [...]  The match
   evaluates positively if and only if the end of the path from the
   record is reached before a difference in octets is encountered.

Regards, Martin

--
Sent through GMX FreeMail - http://www.gmx.net