Mark Crispin <[EMAIL PROTECTED]> writes: > On Fri, 21 Feb 2003, Russ Allbery wrote:
>> Usenet's restrictions on the syntax of message ID headers are very >> specific and very precise, and much stronger than those of RFC 2822, in >> part because message IDs are used as part of the NNTP protocol. > What are those restrictions? The primary ones are: * Absolutely no occurences of either whitespace or the ">" character, escaped or not, are permitted inside the message ID. Either is known to break existing software in various ways. * Nothing is permitted in the Message-ID header other than the message ID itself. Comments either preceding or following the message ID will cause the message to be rejected by many news servers. * The message ID must not be longer than about 500 characters. The failure mode for violating this rule tends to be rather nasty for some existing NNTP software, including things like desynchronization of the protocol between the client and server. NNTP (unfortunately) has a maximum command length defined as part of the protocol. In practice, many news servers enforce a 250 octet limit (including the surrounding angle brackets). Please note that I'm not arguing that these restrictions are desirable, simply that violating them *will* break existing news software. I also don't think that fixing one and possibly two is really worth the effort, since there isn't much in the way of useful purpose served by not following those rules anyway. >> Comments in various places that mail supports them are not >> well-supported by currently deployed Usenet software (although it >> certainly hurts nothing to support them when writing new code, other >> than adding complexity). The space after the colon in headers is not >> optional on Usenet. The syntax of the Date header is restricted in >> ways somewhat similar to that of the Message-ID header. > Golly gee, where's the chorus of "these are bugs that should be fixed" > now? Are you expecting me to serve as the chorus? I certainly hope that you're not expecting me to try to be consistent with statements made by other people that I don't necessarily agree with. I tend to hold my own opinions and not necessarily agree with other people. :) > First we hear the claim that 7-bit messaging restrictions in mail are a > "bug that should be fixed" even though 7-bit was specifically in the > standard. > Now we hear the claim that completely unnecessary restictions in headers > are necessary because of news software. These restrictions are published in RFC 1036, so I would not expect them to be a surprise to news implementors. Usenet has, since B news, used a subset of the mail messaging format. RFC 1036 is unfortunately imprecise about precisely what additional restrictions it put on the message format, but at the least the space after the colon in headers is quite explicit. The message ID restrictions are also fairly clear apart from the length limitation (which falls out of the NNTP protocol instead). (The bit in RFC 1036 about slashes being strongly discouraged in message IDs is now completely obsolete.) The Date specification in RFC 1036 is obnoxious, referring to a particular software implementation that isn't documented as part of the standard. In practice, an RFC 2822 date that doesn't use any of the obsolete syntax is fine provided that the header is not folded. Issues surrounding comments are more complex. Apart from Date and Message-ID, which are the most sensitive headers that are also shared with mail, comments in References headers are unlikely to cause catastrophic problems but may show up as oddities in the thread tree in a news reader and news software can be fairly picky about the From header (although one is likely fine as long as one avoids the obsolete syntax rules). > And the IETF/IESG is supposed to respect this? My message was solely addressing the differences *in practice* that exist right now on the wire. I was not attempting to make any sort of statement about what the future should like. I personally am very strongly in favor of the unification of messaging formats, and think that this is one of the most important things that could come out of USEFOR. I think that it's reasonable to simply require that Usenet software going forward cope with comments in the References header and with the full From syntax in RFC 2822 (possibly omitting the obsolete rules, since they have never been supported on Usenet). I'm ambivalent about folded dates. The date parsing software that I've written personally and that is used in the software I maintain supports them. I don't understand why anyone would generate a folded date, though, so I can understand why people don't see what purpose is served in supporting it. I think that not requiring a space after the colon in headers (except for compatibility with older messages) is silly, but I don't have a strong opinion on it. Changing news software to support this can be a rather significant undertaking, however, given that this rule is clearly specified in RFC 1036 and the assumption tends to be very widespread in any code that parses headers in news messages. The message ID restrictions hit the single hottest code path in every Usenet transit server, and I really don't see any purpose served by complicating the parsing algorithm for message IDs solely to support rather questionable constructions that can be easily avoided. Apart from that personal opinion, I'll also note that removing those restrictions would require extremely significant changes to the Usenet infrastructure and would not be in any sense backwards-compatible; lots of software was written on the basis of the guarantees provided by RFC 1036. My primary consideration in the standards work that I do on Usenet article formats is to support backward compatibility with existing software to the degree that is feasible. My secondary consideration is to support unification of the messaging format in order to get rid of the various places where gatewaying is difficult for silly and unnecessary reasons. I consider tighter integration of Usenet and e-mail to be obviously good, a growing trend, and one of the more interesting applications for Usenet technology going forward. NNTP is an interesting alternative access protocol for large public archives of mail messages because of its extreme simplicity and very lightweight nature, although anonymous IMAP is certainly a strong competitor with its much more advanced searching support. (Either is obviously utterly superior to converting all of the messages to HTML and putting them behind a clumsy web page interface.) NNTP also has some advantages when it comes to mass distribution of messages. > This is because portions of the news community listened to the siren > song of "just send 8-bits" offered by those individuals who song was > rejected in mail. Now the news community has a non-interoperable > disaster. It actually works pretty well on Usenet in those hierarchies that have standardized on a character set. It breaks down very badly whenever those messages move outside of a pure NNTP system, but I'm afraid that I can't agree with your characterization given the number of people who are very happily using untagged character sets in their own hierarchies. However, I *do* agree with you that using random untagged 8-bit character sets is obviously not a solution to the problem. It is a bad hack that works in certain limited and specific situations and actively interferes with movement to a unified messaging format, and Usenet is already running hard against its limitations. > But rather than fix the disaster, they seem to want to inflict a new > disaster upon the email community. I would greatly appreciate it if you would be somewhat careful about who you choose to include in sweeping pronouns like "they." > The solution to interoperability is to stop claiming that news is > special, and start playing ball with the rest of the messaging world. And many news implementors have been doing this for years, so please don't make blanket statements about what everyone on the Usenet side is doing. The active members of the USEFOR working group are not a representative sample of Usenet implementors or users, and many of us who strongly believe in a unified messaging format gave up on USEFOR in disgust years ago. -- Russ Allbery ([EMAIL PROTECTED]) <http://www.eyrie.org/~eagle/>
