Please do not communicate with me privately on this matter without cc'ing the IMAP and usenet-format mailing lists. Your statements about what I purportedly said were forwarded to me.
I note that there is an I-D, draft-kohn-news-article-00.txt, which addresses Usenet mail format in a compatible fashion without introducing the major incompatibilities that would be inflicted on other protocols (including IMAP) by adding raw, untagged UTF-8 to news message headers. I also note that there is a charter revision in progress for Usefor. On Fri, 14 Feb 2003, Charles Lindsey wrote: > > > I don't think the last vestiges of "just send 8-bits" using non-UTF-8 > > > character sets and no MIME tagging are being exterminated, or ever will > > > be. At the moment they seem to be well entrenched in Usenet, especially > > > in the Chinese newsgroups, and no amount of deprecating it in standards > > > is going to stop it. That, sadly, is how the real world works. > > If this is true (and I fear that it is), that spells doom for any document > > which attempts to defined untagged 8-bit as being UTF-8. > I don't see why. That current behaviour is totally non-compliant. Now > whilst it is nice to bring widely deployed non-standard behaviour within > the standards if it can be done (Chris Newman said this recently on the > ietf-822 list), it is manifestly impossible in this case. But that is no > reason to hold the standards process up indefinitely. Two wrongs don't make a right. This is not how the IETF works. The Kohn I-D suggests a means to accomplish what is necessary without inflicting major incompatibilities and mandates upon other protocols. There's a few minor nits with that document, but it definitely appears to be a step in the right direction. > Well if you are suggesting a new header along the lines of > This-Message-Contains-8bit-Headers: [yes/no] > (syntax left stupid because such a header might contain lots of other > goodies as well) then I might well be persuaded. Particularly so if it > could lead to some way of bringing UTF-8 into Email messages as well > (though that might be a longer prospect). This is a start. But the tag needs to be specifically UTF-8, not 8-bit, and the means of downgrading to a 7-bit environment must be clearly specified. > > You have to consider what happens when a UTF-8 interpreter is presented > > with 8-bit text that is not in UTF-8. Fortunately, UTF-8 has a fairly > > distinctive pattern, and non-trivial non-UTF-8 text is unlikely to > > replicate it. Nevertheless, you can't assume this. > On the tests that we did (a whole week's worth on Supernews) the false > positive rate was almost too small to measure. So, you are expecting that all implementors of all other messaging protocols do UTF-8 validity checks on untagged 8-bit data, and do what? if it fails the validity check. And what happens when there is a mistake? Some small piece of otherwise valid UTF-8 was corrupted and flunks the validity check. Or there's a false positive? Just tell the implementors "On the tests that we did (a whole week's worth on Supernews) the false positive rate was almost too small to measure" so when the customers scream for a fix they have no standard to guide them?!? > Well that is certainly how Netnews is intended to aork, and always has > been. Once an article has been injected, it should remain unaltered > (Path headers excepted) until it arrives at the user's reading agent. > That way, everybody gets to see the identical article. There are > exceptions for gateways, but it would be far better for IMAP to be > regarded as a part of the Netnews transport system, than as a gateway > for downgrading to strict email format. IMAP is messaging, and the sooner that people stop thinking about "news" and "mail" and start thinking about "messaging", the better. > ANd I do not see anywhere in RFC 2060 where the IMAP protocol permits > any part of the actual message to be altered in any way - AFAICS the > client gets to see exactly what was delivered to the IMAP server by > SMTP, or NNTP, or however else it came in (well, I see facilities for > supplying headers only, or individual body parts only, but those are not > at issue). IMAP does not allow untagged 8-bit data. If the mail store has such data as a result of compliant protocol exchanges, an IMAP server is obligated to do what is necessary to comply with the requirements of IMAP. Currently there is no compliant protocol exchange that would cause this to happen. An IMAP server today can weasel out with "garbage in, garbage out". You are proposing to introduce such an exchange. You recognize that this impacts IMAP, hence your discussions for an extension, but you have not broached the issue of complete interoperability and compliance with and without the extension. > > If so, then OK, make newsgroup names be UTF-8. > In which case IMAP needs to accept these headers at least and pass them > to the clients. Not if the Newsgroups header complies with the Kohn I-D. > > IMAP extensions are negotiated between client and server. The server must > > offer, and a client must approve, an extension before it takes effect. > Hmmm! I didn't see that in any of the existing extensions I looked at. > But they all seemed to be of the nature "Server offers capability of > some new command (e.g. IDLE); client observes capability and proceeds to > use the new command". WRONG!!! The server is FORBIDDEN from using ANY extension unless the client approves. The common method of client approval is by the client using a command tied to the extension; if the client uses the command then the server knows that the client has approved. But a server MUST NOT unilaterally use an extension without client approval. It can only offer the availability of the extension on the server. > But mine is the other way around. "Server announces that some messages > will be sent with 8-bit in headers; client is supposed to accept the new > goodies with gratitude". And what I am telling you is that that can not be done. Period. > So I think you are asking for a new command > of the form "Please do/do not send me these new-fangled things". Or, > putting it more politely, an ENABLE8BIT command. Is that correct? In effect. The new command is "you may send me these new-fangled things". In the absence of the client negotiating such a command, the server MUST do what is necessary to comply with the old-fashioned way. > > The idea of dropping a message on the floor because an extension is not > > negotiated is so repulsive, it's best not discussed further. > Well what does currently happen if 8-bit headers arrive at the server by > SMTP? Presumably a 5xx response in that case, though you do say later > on: > > 8-bit characters are not allowed in headers, period. On my system, such > > messages are considered to be spam. > In the case of such headers arriving at the server by NNTP, dropping on > the floor is the only option. Red herring. 8-bit headers are prohibited in SMTP. They should be prohibited in NNTP for the same reason. The Kohn draft solves the problem without using 8-bit headers. > > In other words, if a "talk UTF-8 headers" extension is not negotiated (and > > this will be the default), then the server must regenerate the message > > with 7-bit MIME quoted-words and update all sizes to stay compliant with > > the specification. > That is, in general, impossible Which is why it's a bad idea to send 8-bit headers, and why an alternative such as the Kohn draft is the correct solution. Implementors (and I am an implementor) will rightfully object to mandates that are impossible to implement. > I am trying to get a standard > that makes clear when and where it is allowed and the extent to which it > is expected to work. Maybe the IESG does not accept that standard and we > have to think again, but that is my problem (or rather my WG's problem). I have reason to believe that the IESG is not going to accept a document which designers and implementors of other protocols will object to, particularly when those objections are based upon the admitted existance of impossible mandates. Once again, I strongly suggest that you consider the direction outlined in the Kohn draft. It may not be as elegant as raw UTF-8 in headers (which everybody agrees would be the ideal) but it doesn't inflict this trauma either. -- Mark -- http://staff.washington.edu/mrc Science does not emerge from voting, party politics, or public debate.
