Re: [dane] encoding internationalized mail addresses in the DNS

John Levine Wed, 01 Apr 2015 18:20:13 -0700

>> As far as I can tell, the base32 approach handles everything that
>> hashing does
>
>The draft, has a "characters/octets" conflation problem, as do the
>sequence of successors to RFC-822.
>
>This likely dates back to the before-UTF and before-IDNA era, and how
>the docs were updated.


Until RFC 6530-6532, SMTP commands were (and mostly still are) ASCII
only, and mail message headers were and are ASCII only.  When 5321
says octets, those octets hold ASCII characters.  

With internationalised mail as described in RFCs 6530-6532, there is a
new SMTP extsnsion SMTPUTF8 which allows the client and server to
agree that they allow UTF-8 in the commands and message headers as
well as ASCII, but the octet limits don't change.  In particular,
local-part is limited to 64 octets whether it contains only ASCII or
UTF-8.

>- when and why was the 64-length local-part introduced? (It appeared
>magically in RFC 2821 4.5.3.1)

It was documenting existing practice.  See below.

>- when it changed from "64 characters" to "64 octets" in RFC 5321,
>this implicitly affected UTF-2/3/4 languages - was this impact
>considered?

There was nothing to consider -- non-ASCII wasn't valid in any SMTP
transaction so the 64 characters was 64 octets.

>- is the 64 octet limit sufficiently universally enforced to be
>considered an actual de-facto standard?

As I learned the hard way when I was experimenting with BATV (a hack
that put signatures in bounce addresses), if your local-parts are
longer than 64 octets, things break.  I think MS Exchange was one of
the strictest, and it's all over the place.

>- is it really the case that UTF-2 mailboxes are <= 32 characters,
>UTF-3 mailboxes <= 21 characters, and UTF-4 mailboxes <= 16
>characters?

If you are referring to multibyte UTF-8 characters, a UTF-8 string
typically has a mixture of characters whose encodings are of different
numbers of bytes.  The SMTP and mail header limit is on the UTF-8
octets, not the decoded Unicode characters.

>Hashing, by definition, does not care about the length of local-part,
>and thus is less restrictive. In effect, it ignores that 64 character
>limit. (I'm not entirely sure if that is a pro, con, or irrelevant.)

I think it's irrelevant -- I don't think I've ever seen a local part
longer than 64.  Most are much shorter.

>However, the _other_ issue is that the length of local-part _encoding_
>implicitly limits the parent domain name length. Using 2 x 52-octet
>long labels, reduces the maximum domain name length to 255-106=149
>octets, minus whatever the "_mailbox" label's length is, and is
>applicable to IDNA and non-IDNA domain names alike.

I suppose that could be a problem, but it's one we might run into at
some point whatever we do. 

Here's a data point: I run abuse.net, a place where people register
abuse contact addresses primarily for their mail domains.  People have
registered info for over 500,000 domains during the last decade.  I
just checked, and the longest names for which anyone has ever
registered a contact are 67 characters, and they look pretty bogus:

mp3-mp3-mp3-mp3-mp3-mp3-mp3-mp3-mp3-mp3-mp3-mp3-mp3-mp3-mp3-mp3.net
realestateagentnewhomeforsalerealtysjoselistingsjoserestaurants.com
sanjoserestaurantitalianchinesejapaneseindianmexicanpizzafrench.com

People can and do register names with three or four name components,
but these are the longest overall.  Everything else is shorter, mostly
much shorter.

So it looks to me that the chances of a non-contrived mail address
having a domain approaching 140 octets is vanishingly small.

R's,
John

_______________________________________________
dane mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/dane

Re: [dane] encoding internationalized mail addresses in the DNS

Reply via email to