On Mon, 10 Feb 2003 15:30:10 -0800 (Pacific Standard Time)
Mark Crispin <[EMAIL PROTECTED]> said...
>
> On Sat, 8 Feb 2003, Charles Lindsey wrote:
> > I don't think the last vestiges of "just send 8-bits" using non-UTF-8
> > character sets and no MIME tagging are being exterminated, or ever will
> > be. At the moment they seem to be well entrenched in Usenet, especially
> > in the Chinese newsgroups, and no amount of deprecating it in standards
> > is going to stop it. That, sadly, is how the real world works.
>
> If this is true (and I fear that it is), that spells doom for any document
> which attempts to defined untagged 8-bit as being UTF-8.
I don't see why. That current behaviour is totally non-compliant. Now
whilst it is nice to bring widely deployed non-standard behaviour within
the standards if it can be done (Chris Newman said this recently on the
ietf-822 list), it is manifestly impossible in this case. But that is no
reason to hold the standards process up indefinitely.
> There MUST be a mandatory token, someplace within the message, to indicate
> that the message complies with Usefor; and a Usefor-compliant message
> reader MUST NOT interpret the message as Usefor-compliant (and thus that
> untagged 8-bit is UTF-8) without that token.
Well if you are suggesting a new header along the lines of
This-Message-Contains-8bit-Headers: [yes/no]
(syntax left stupid because such a header might contain lots of other
goodies as well) then I might well be persuaded. Particularly so if it
could lead to some way of bringing UTF-8 into Email messages as well
(though that might be a longer prospect).
>
> > In fact, the Europeans probably
> > will, and the Japanese.
>
> Who are "the Europeans"? The left-wing media here says that "the
> Europeans" oppose US military action in Iraq, whereas the right-wing media
> says that two dozen European countries support such action.
But they have much less deployment of untagged 8bit than the Chinese and
it arises on a much more ad hoc basis - there is no "national policy" to
get in the way.
>
> Japanese messaging is 7-bit JIS 0208 with ISO 2022 encoding. Japanese
> text already has two commonly-used and incompatible 8-bit forms, and thus
> 8-bit has made no headway for messaging.
Which is fine. They will carry on using their ISO 2022. If/when they decide to
go 8-bit they will likely go to UTF-8 if that is by then the standard. And if
they decide not to use 8-bit, then the question does not arise.
> > The Chinese are another matter.
>
> This is a problem. A *serious* problem. This can not be lightly
> dismissed.
If the Chinese were to continue with their own 8-bit even after UTF-8
had been made the norm, then they would comprise a "cooperating subnet"
(a term defined in Usefor) and they should just be left to get on with
it. Those who wanted to would still find some way to read it, just as
they do now, so they would be no worse off than at present.
> Whoa, pardner!
>
> You have to consider what happens when a UTF-8 interpreter is presented
> with 8-bit text that is not in UTF-8. Fortunately, UTF-8 has a fairly
> distinctive pattern, and non-trivial non-UTF-8 text is unlikely to
> replicate it. Nevertheless, you can't assume this.
On the tests that we did (a whole week's worth on Supernews) the false
positive rate was almost too small to measure.
>
> I recommend that Usefor concentrate on fixing NNTP, and defining necessary
> extensions to 2822 and MIME for news that do *NOT* conflict with current
> usage.
No, NNTP is explicitly outside of our remit. There is a separate WG for
that (and they also are looking to UTF-8).
>
>
> > No, the extension I proposed did not require the server to convert
> > anything. If it comes in as encoded-words, you send it to the client
> > that way. If it comes in as UTF-8, you send it to the client that way.
> > It is comes in as anything else, you still send it to the client as-is.
> > It is the client's job to process it.
>
> WRONG! This is not how IMAP works. If these are your assumptions, this
> explains part of the misunderstanding.
Well that is certainly how Netnews is intended to aork, and always has
been. Once an article has been injected, it should remain unaltered
(Path headers excepted) until it arrives at the user's reading agent.
That way, everybody gets to see the identical article. There are
exceptions for gateways, but it would be far better for IMAP to be
regarded as a part of the Netnews transport system, than as a gateway
for downgrading to strict email format.
ANd I do not see anywhere in RFC 2060 where the IMAP protocol permits
any part of the actual message to be altered in any way - AFAICS the
client gets to see exactly what was delivered to the IMAP server by
SMTP, or NNTP, or however else it came in (well, I see facilities for
supplying headers only, or individual body parts only, but those are not
at issue).
>
>
> > Newsgroup-names 'on the wire' will be in one
> > canonical format - UTF-8 as things stand.
>
> Is it confirmed that there are no non-ASCII newsgroup names?
Yes, there are currently NO non-ASCII newsgroup names on the public
Usenet. Not even the Chinese have tried to do that (yet).
>
> If so, then OK, make newsgroup names be UTF-8.
In which case IMAP needs to accept these headers at least and pass them
to the clients.
>
> If not, then you had better not do this unless you create alternatives to
> LISTGROUP, GROUP, etc. that support UTF-8. As an implementor, I can *NOT*
> be asked to implement something that would be a bug in existing systems.
Existing NNTP implementations handle those things just fine. There is
an existing group dk.test.utf8-��� which is available on many servers
worldwide and lots of people have accessed it successfully using NNTP
(sorry, there is _one_ non-ASCII group on Usenet).
>
>
> > I am not sure what you mean by "negotiate the extension". What commands
> > would the client issue to do that? My understanding was that if the
> > extension was not supported (or not negotiated), then articles with
> > 8-bit headers would be dropped on the floor; the client would never see
> > them.
>
> IMAP extensions are negotiated between client and server. The server must
> offer, and a client must approve, an extension before it takes effect.
Hmmm! I didn't see that in any of the existing extensions I looked at.
But they all seemed to be of the nature "Server offers capability of
some new command (e.g. IDLE); client observes capability and proceeds to
use the new command".
But mine is the other way around. "Server announces that some messages
will be sent with 8-bit in headers; client is supposed to accept the new
goodies with gratitude". So I think you are asking for a new command
of the form "Please do/do not send me these new-fangled things". Or,
putting it more politely, an ENABLE8BIT command. Is that correct?
>
> The idea of dropping a message on the floor because an extension is not
> negotiated is so repulsive, it's best not discussed further.
Well what does currently happen if 8-bit headers arrive at the server by
SMTP? Presumably a 5xx response in that case, though you do say later
on:
> 8-bit characters are not allowed in headers, period. On my system, such
> messages are considered to be spam.
In the case of such headers arriving at the server by NNTP, dropping on
the floor is the only option.
> In other words, if a "talk UTF-8 headers" extension is not negotiated (and
> this will be the default), then the server must regenerate the message
> with 7-bit MIME quoted-words and update all sizes to stay compliant with
> the specification.
That is, in general, impossible, except with headers that are recognized
and parsed by the server. If I send you a header
Foo-Bar: <(���)> (those characters being in UTF-8)
where the Foo-Bar header is defined in some standards-track RFC that was
published last week, then you have no idea of the proper way to convert
it to RFC 2047. The best I have come up with is to change it to an
X-Foo_Bar header, whereupon it can be treated as unstructured, and that
is what Usefor recommends, but only for gateways, not for delivering to
user agents.
>
> > However, as regards 8-bit in headers,
> > there is a very strong feeling that it will never be fixed unless
> > someone makes a first move (and the inaction of the standards bodies to
> > make a move thus far is the exact cause of the move having been made by
> > the marketplace, with the unfortunate consequences that we all see).
>
> I respectfully suggest that "cowboy" type actions, such as you are
> proposing that Usefor do, are the very reason for this "inaction."
No, the cowboys are those who start sending this stuff contrary to the
standard (like the Chinese, it seems). I am trying to get a standard
that makes clear when and where it is allowed and the extent to which it
is expected to work. Maybe the IESG does not accept that standard and we
have to think again, but that is my problem (or rather my WG's problem).
> I prefer to ignore RFC 2231. It seems to be very rarely used.
I too would prefer to ignore RFC 2231, but unfortunately I can't.
Apparently it does get used in a few places and some user agents do
understand it.
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131 Fax: +44 161 436 6133 Web: http://www.cs.man.ac.uk/~chl
Email: [EMAIL PROTECTED] Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9 Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5