Re: [db-wg] Puny code or UTF-8 (or both)?

Tony Finch via db-wg Mon, 13 Jul 2020 07:44:09 -0700

Peter Koch via db-wg <[email protected]> wrote:
>
> I'm not sure I understand the proposal.


Me too :-)

> "punycode" is primarily IDNA2003 speak

AFAIK IDNA2008 uses punycode in exactly the same way as IDNA2003.
One of the major changes was to get rid of stringprep.

> How would that system deal with conversion failures and/or with
> ambiguities between IDNA2003 and IDNA2008?

My understanding is that we want to support Unicode for lots of fields
in the database, and the suggestion is that it might be easier to jam
punycode into the existing ISO 8859-1 fields.

I think this will be difficult if the database is going to use punycode
for fields that aren't domain names or email addresses, and that don't
have standard encoding rules. In particular I wonder how to handle spaces
and upper/lower case. It might be easier to use base64 than punycode (but
actually I think that's a terrible idea).

There's also George Michaelson's point that the database should have both
the original form of the field as well as a latin transliteration if
necessary. And this is necessary regardless of how the original form is
encoded (UTF-8, punycode, whatever).

So I think it might be worth adding support for transcoding to/from
punycode domain names and email addresses without waiting for full UTF-8
support, because that's likely to be useful in the long term. (Maybe
something like the DENIC `-T ace` whois option?) But for other fields I
doubt there is a stop-gap that will be easy and useful and not enormously
regrettable in the future.

Tony.
-- 
f.anthony.n.finch  <[email protected]>  http://dotat.at/
South Fitzroy: Northerly 5 to 7. Moderate or rough, becoming slight or
moderate in northeast. Mainly fair. Good.

Re: [db-wg] Puny code or UTF-8 (or both)?

Reply via email to