> In article 
> <ofc0fea11b.05dda05c-onc125826a.0038eb98-c125826a.0038f...@notes.na.collabserv.com>
>  you write:
> >-=-=-=-=-=-
> >-=-=-=-=-=-
> >-=-=-=-=-=-
> >
> >Hello folks
> >
> >I've been tasked with finding out what the general consensus is on the
> >support in email headers for International characters such as  UTF-8
> >Charcacters and including things like accented characters like � and � and
> >can also include Asian and Cyrillic characters.
> >
> >I know there's an RFC from 2012, but my Product Dev people are interested
> >in knowing how wide-spread the actual adoption is.

> Funny you should ask.  I'm doing some work for the UASG group to document how
> internationalized email (known as EAI) works.

> UTF-8 in everything except the actual addresses can be in MIME body
> parts and encoded-words in mail headers.  Those have been around for
> at least a decade and should work everywhere.

For some value of "work", yes. While the risks of the so-called "mailsploit"
vulerabilities were overblown, encoded-words still have issues, even after all
this time.

The good news is that it's been a while since I've gotten a report of an issue
elsewhere in MIME.

> RFCs 6530-6533 defined an SMTP extension called SMTPUTF8 which, to
> oversimplify a little, allows UTF-8 anywhere you can have ASCII,
> including in both the local part and the domain part of the addresses.
> This modifies both the messages themselves and the address in the
> SMTP dialog MAIL FROM and RCPT TO.

It also imposes an end-to-end support requirement, that is, everything along
the MUA->MSA->1*(MTA->)MDA->MS->MUA path must be upgraded to support EAI. And
this requirement exists even when none of the current recipients of the message
are EAI recipients, due to the possible presence of EAI addresses in the header
and MAIL FROM.

This is proving to be highly problematic in practice. And the additional
support beyond what the standard requires needed to take care of it properly in
an operational context is IMO nontrivial.

> Uptake has been slow, but Gmail quietly added support last year, and
> Hotmail/Outlook/Live added support about a month ago.  Some of the
> large Chinese services like Coremail support it as do some Indian
> services like Xgenplus.  Yahoo/AOL/Oath have as far as I can tell no
> plans to support it.

Not to sound like a broken record, but... for some value of "support", maybe.

> The Gmail and Hotmail support handles other people's UTF-8 addresses
> in mail but they still don't provide UTF-8 addresses on their own
> systems.

>From what I can tell, Gmail and outlook.com's support is basically "just send
UTF-8", that is, it will send EAI messages without the server offering the
extension. The only time it checks is when the actual recipient address is EAI
- in that case the extension is required. But if the message is EAI only
because of EAI headers, they just blast away.

Now, you can certainly argue that "just send it" is the "right" thing to do.
Average users don't know what's ASCII and what's not, and don't care to know.
They just want their mail delivered. So when they send a message to two people,
one with an EAI address and one with a regular address, and one copy bounces
for no obvious reason, I'd have to call that a pretty serious violation of the
least astonishment principle.

Nevertheless, it's contrary to what the standards require - that the message
bounce - so calling it "support" is a bit of a stretch.

As far as open source MTAs go, the support in Postfix appears to be standards
compliant - EAI messages sent to non-EAI recipients are bounced. See:

  http://www.postfix.org/SMTPUTF8_README.html

for specifics.

FWIW, I think the correct thing to do is downgrade EAI messages sent to non-EAI
recipients on non-EAI hosts, but doing this is a significant PITA for a bunch
of reasons I'm not going to get into now. Also FWIW, this is what our MTA
defaults to, but we also support "reject" and "just send it" in case
someone wants those behaviors.

> It is my impression that the main interest is currently in
> India since some bits of the government are planning to hand out
> e-mail addresses to go with the biometric IDs, and a lot of Indians
> are literate in their own languages, which are written in their own
> scripts, but not English.

Which may be sufficient to induce the larger MSPs and ISPs in the US and Europe
to upgrade. But not the smaller ones, who Just Don't Care. Maybe, if
all of the open source MTAs and message stores start supporting EAI, they
can be asked to flip the necessary switches. But I'm not optimistic.

Amusingly, the thing that may drive US adoption is the desire to put emoji in
addresses, which seems to be widespread. Sigh.

> Having recently written EAI support into my own qmail system I can say
> that the basic address handling was a lot easier than I expected,
> since most mail code these days is already 8-bit clean sort of by
> accident.

I agree that this isn't difficult. What's difficult is keeping track of the
EAI-ness of a message as it goes through processing like alias expansion, which
can turn an non-EAI message into an EAI message or vice versa.

Support for the nested encodings message/global creates may also be
nontrivial.

> The hardest part, which I haven't done yet, is generalizing
> the address mapping that MTAs do on incoming mail.  Converting between
> upper and lower case is remarkably language-specific, even in
> languages written in Latin characters.  Add things like all the ways
> Unicode can represent accented characters, the meaning of
> o-with-umlaut which is short for "oe" in German but not in
> Scandinavia, and Chinese traditional and simplified characters and
> it's a challenge to make addresses work in ways that seem natural in
> whatever language the address is written in.

This I frankly don't care about, as I believe that doing it in a meaningful
language-specific way is impossible.

Like it or not, allowing Unicode in local-parts exposes the assertion that
"local-parts belong to their associated domain and should not be interpreted by
anyone else" as the lie it's been for 30+ years.

Any software that implements address whitelists, blacklists, contact lists,
address-based mailing list access controls, address duplicate elimination, or
any of a vast array of other functions is already comparing local-parts as part
of it's operation. To the best of my knowledge these comparisons are
universally done in a case-insensitive way. (Which means case-sensitive
local-parts don't work, statements in the standards to the contrary
notwithstanding.)

Exactly how is all this software supposed to adapt to a world where Unicode
local-parts are allowed and everyone gets to pick the case conversion they want
to use?

Of course I can't be sure, but I can tell you what I think is likely to happen:
Assuming EAI deploys sufficiently for this to be an actual issue, everyone is
going to pick a canonicalization and use it. And yes, that means the folks who
want to use dotless i's and whatnot are going to be SOL.

                                Ned

_______________________________________________
mailop mailing list
mailop@mailop.org
https://chilli.nosignal.org/cgi-bin/mailman/listinfo/mailop

Reply via email to