On Sat, 8 Feb 2003, Charles Lindsey wrote:
> I don't think the last vestiges of "just send 8-bits" using non-UTF-8
> character sets and no MIME tagging are being exterminated, or ever will
> be. At the moment they seem to be well entrenched in Usenet, especially
> in the Chinese newsgroups, and no amount of deprecating it in standards
> is going to stop it. That, sadly, is how the real world works.

If this is true (and I fear that it is), that spells doom for any document
which attempts to defined untagged 8-bit as being UTF-8.  As an
implementor, I would oppose the standardization of any document which
pretended otherwise, because it would not be possible to implement the
standard without colliding with this real world problem.  I would also
lobby the IESG to reject such a document for this reason.

Let's be clear on something; the IETF is not in the business of creating
pie in the sky specifications that nobody will (or can) implement.
Historically, when real-world considerations collide with a design, the
IETF has sought an alternative that does not collide.  If IETF has a god,
the name of that god is Interoperability.

> The view on Usefor is that if you give them a legitimate way to do it
> (i.e. UTF-8, because there really is nothing else), then enough of them
> might, just might, switch over to it.

Although well-intentioned, by itself this is not good enough.

There MUST be a mandatory token, someplace within the message, to indicate
that the message complies with Usefor; and a Usefor-compliant message
reader MUST NOT interpret the message as Usefor-compliant (and thus that
untagged 8-bit is UTF-8) without that token.

If you choose to reject this advice and create a non-interoperable
proposal, then do not be surprised if Usefor gets pushed back by the IESG.

> In fact, the Europeans probably
> will, and the Japanese.

Who are "the Europeans"?  The left-wing media here says that "the
Europeans" oppose US military action in Iraq, whereas the right-wing media
says that two dozen European countries support such action.

Japanese messaging is 7-bit JIS 0208 with ISO 2022 encoding.  Japanese
text already has two commonly-used and incompatible 8-bit forms, and thus
8-bit has made no headway for messaging.  So they would have less problem
with 8-bit UTF-8.  However, my recent inquiries to my Japanese friends
about UTF-8 email in Japan were not encouraging.

> The Chinese are another matter.

This is a problem.  A *serious* problem.  This can not be lightly
dismissed.

> For sure, Usefor
> declares anything other than UTF-8 as "non compliant", but it takes the
> view that it is better to let the clients see the non-compliant stuff
> (they might just be able to make out what it is meant to be) than to
> prevent them from seeing it at all.

Whoa, pardner!

You have to consider what happens when a UTF-8 interpreter is presented
with 8-bit text that is not in UTF-8.  Fortunately, UTF-8 has a fairly
distinctive pattern, and non-trivial non-UTF-8 text is unlikely to
replicate it.  Nevertheless, you can't assume this.

There must be a mechanism by which messaging text is affirmatively
identified as being UTF-8 and compliant.

> But are you saying that if Usefor mandates 8-bit clean headers for
> news, and the IESG accepts their draft (for the sake of argument - they
> might or they might not) then you would still refuse to bring the IMAP
> protocol into line?

This is not how the IETF process works.  The fact that you've said such a
thing implies that you badly misunderstand the process.

No standard exists in a vaccuum.  All standards must interact and
interoperate with all other standards that they touch.

In other words, Usefor's proposed changes to messaging data must interact
and interoperate with IMAP, and in turn must interact and interoperate
with RFC 2822 and SMTP.

This is why I'm saying that this part of the Usefor effort should be
spawned off to a separate working group tasked with i18n for messaging.

> All I am trying to do with my suggested extension is
> to find a mechanism whereby the two protocols change at the same time.
> Either the Usefor proposal is accepted, in which case the IMAP extension
> comes with it, or it is not accepted, in which case both fail.

Assuming that you want success and not failure, you need a different
strategy.

I recommend that Usefor concentrate on fixing NNTP, and defining necessary
extensions to 2822 and MIME for news that do *NOT* conflict with current
usage.

Punt messaging i18n to another working group tasked for that, instead of
trying to take over i18n for all other forms of messaging.

> No, the extension I proposed did not require the server to convert
> anything. If it comes in as encoded-words, you send it to the client
> that way. If it comes in as UTF-8, you send it to the client that way.
> It is comes in as anything else, you still send it to the client as-is.
> It is the client's job to process it.

WRONG!  This is not how IMAP works.  If these are your assumptions, this
explains part of the misunderstanding.

It would be much better if Usefor comes up with a proposal for how the
message will be represented, and allow the IMAP community to determine how
IMAP will be extended.

> Newsgroup-names 'on the wire' will be in one
> canonical format - UTF-8 as things stand.

Is it confirmed that there are no non-ASCII newsgroup names?

If so, then OK, make newsgroup names be UTF-8.

If not, then you had better not do this unless you create alternatives to
LISTGROUP, GROUP, etc. that support UTF-8.  As an implementor, I can *NOT*
be asked to implement something that would be a bug in existing systems.


> I am not sure what you mean by "negotiate the extension". What commands
> would the client issue to do that? My understanding was that if the
> extension was not supported (or not negotiated), then articles with
> 8-bit headers would be dropped on the floor; the client would never see
> them.

IMAP extensions are negotiated between client and server.  The server must
offer, and a client must approve, an extension before it takes effect.

The idea of dropping a message on the floor because an extension is not
negotiated is so repulsive, it's best not discussed further.  Assume that
there has to be a defined method of handling all messages that complies
with the specification, both with and without the negotiated extension.

In other words, if a "talk UTF-8 headers" extension is not negotiated (and
this will be the default), then the server must regenerate the message
with 7-bit MIME quoted-words and update all sizes to stay compliant with
the specification.  This is not an inexpensive undertaking.

> Is that not what happens in present implementations, on account of
> the "MUST be 7-bit; 8-bit characters are not permitted in headers" in
> the FETCH response?

8-bit characters are not allowed in headers, period.  On my system, such
messages are considered to be spam.

> >  3) If header items or mailbox names are 8-bit but *NOT* UTF-8, then the
> >     server has no way of doing the right thing.  Since (2) will be the
> >     more common case than (1), the server will end up causing further
> >     data to the data unless it is lucky enough to recognize a sequence
> >     of octets that is invalid UTF-8.
> I still maintain it is the client's problem.

This is not acceptable, and I strongly recommend that you take a different
position if you want to achieve success.


> However, as regards 8-bit in headers,
> there is a very strong feeling that it will never be fixed unless
> someone makes a first move (and the inaction of the standards bodies to
> make a move thus far is the exact cause of the move having been made by
> the marketplace, with the unfortunate consequences that we all see).

I respectfully suggest that "cowboy" type actions, such as you are
proposing that Usefor do, are the very reason for this "inaction."

> And BTW I have just rememberd an item #4. RFC 2231 makes (AFAICS) a
> surreptitions change to the IMAP protocol. Have you taken that one on
> board?

I prefer to ignore RFC 2231.  It seems to be very rarely used.  It doesn't
change the IMAP protocol at all; rather, it places a mandate on the MIME
parser used by IMAP servers.  Even then it's only a SHOULD.

-- Mark --

http://staff.washington.edu/mrc
Science does not emerge from voting, party politics, or public debate.

Reply via email to