>> As far as I can tell, the base32 approach handles everything that >> hashing does > >The draft, has a "characters/octets" conflation problem, as do the >sequence of successors to RFC-822. > >This likely dates back to the before-UTF and before-IDNA era, and how >the docs were updated.
Until RFC 6530-6532, SMTP commands were (and mostly still are) ASCII only, and mail message headers were and are ASCII only. When 5321 says octets, those octets hold ASCII characters. With internationalised mail as described in RFCs 6530-6532, there is a new SMTP extsnsion SMTPUTF8 which allows the client and server to agree that they allow UTF-8 in the commands and message headers as well as ASCII, but the octet limits don't change. In particular, local-part is limited to 64 octets whether it contains only ASCII or UTF-8. >- when and why was the 64-length local-part introduced? (It appeared >magically in RFC 2821 4.5.3.1) It was documenting existing practice. See below. >- when it changed from "64 characters" to "64 octets" in RFC 5321, >this implicitly affected UTF-2/3/4 languages - was this impact >considered? There was nothing to consider -- non-ASCII wasn't valid in any SMTP transaction so the 64 characters was 64 octets. >- is the 64 octet limit sufficiently universally enforced to be >considered an actual de-facto standard? As I learned the hard way when I was experimenting with BATV (a hack that put signatures in bounce addresses), if your local-parts are longer than 64 octets, things break. I think MS Exchange was one of the strictest, and it's all over the place. >- is it really the case that UTF-2 mailboxes are <= 32 characters, >UTF-3 mailboxes <= 21 characters, and UTF-4 mailboxes <= 16 >characters? If you are referring to multibyte UTF-8 characters, a UTF-8 string typically has a mixture of characters whose encodings are of different numbers of bytes. The SMTP and mail header limit is on the UTF-8 octets, not the decoded Unicode characters. >Hashing, by definition, does not care about the length of local-part, >and thus is less restrictive. In effect, it ignores that 64 character >limit. (I'm not entirely sure if that is a pro, con, or irrelevant.) I think it's irrelevant -- I don't think I've ever seen a local part longer than 64. Most are much shorter. >However, the _other_ issue is that the length of local-part _encoding_ >implicitly limits the parent domain name length. Using 2 x 52-octet >long labels, reduces the maximum domain name length to 255-106=149 >octets, minus whatever the "_mailbox" label's length is, and is >applicable to IDNA and non-IDNA domain names alike. I suppose that could be a problem, but it's one we might run into at some point whatever we do. Here's a data point: I run abuse.net, a place where people register abuse contact addresses primarily for their mail domains. People have registered info for over 500,000 domains during the last decade. I just checked, and the longest names for which anyone has ever registered a contact are 67 characters, and they look pretty bogus: mp3-mp3-mp3-mp3-mp3-mp3-mp3-mp3-mp3-mp3-mp3-mp3-mp3-mp3-mp3-mp3.net realestateagentnewhomeforsalerealtysjoselistingsjoserestaurants.com sanjoserestaurantitalianchinesejapaneseindianmexicanpizzafrench.com People can and do register names with three or four name components, but these are the longest overall. Everything else is shorter, mostly much shorter. So it looks to me that the chances of a non-contrived mail address having a domain approaching 140 octets is vanishingly small. R's, John _______________________________________________ dane mailing list [email protected] https://www.ietf.org/mailman/listinfo/dane
