On Sun, 19 Dec 2004, J and T wrote: > I understand what you're saying and I completely agree with you if I had not > read something different at the w3c.org and that Yahoo! indexed the example > site below. (please notice Subject "Possible" problem with RobotRules?) > > According to this document: > > http://www.w3.org/TR/1998/REC-html40-19980424/appendix/notes.html#h-B.4.1.1 > > B.4.1 Search robots > The robots.txt file > > It states: > > Some tips: URI's are case-sensitive, and "/robots.txt" string must be all > lower-case. Blank lines are not permitted. > > "Blank lines are not permitted." is stated here and I wouldn't have asked > this question if the W3C was not the one stating this. I personally believe > the W3C is in error, but there are a lot of people who believe the W3C is > God here.
The W3C's error is noted in the errata for the old version of HTML 4 that you cited, and it's corrected in the latest HTML 4 Recommendation. http://www.w3.org/MarkUp/html40-updates/REC-html40-19980424-errata.html The specification reads, "Blank lines are not permitted." Blank lines are permitted in the robots.txt file, just not within a single "record". Note that the specification doesn't define record. http://www.w3.org/TR/html4/appendix/notes.html#h-B.4.1.1 Some tips: URI's are case-sensitive, and "/robots.txt" string must be all lower-case. Blank lines are not permitted within a single record in the "robots.txt" file. > Isn't the W3C the authority on this > stuff? Not on robots.txt. The W3C's section on robots.txt is buried in an appendix to the HTML 4 Recommendation and preceded with "The following notes are informative, not normative." -- Liam Quinn