Re: [idn] Punicode: Upper-case in example

Martin Duerst Wed, 27 Nov 2002 12:53:17 -0800

Hello Adam,

Many thanks for your quick response.


At 22:50 02/11/26 +0000, Adam M. Costello wrote:

Martin Duerst <[EMAIL PROTECTED]> wrote:

> In http://www.ietf.org/internet-drafts/draft-ietf-idn-punycode-03.txt,
> example (I) says:
>
>  (I) Russian (Cyrillic):
>         U+043F u+043E u+0447 u+0435 u+043C u+0443 u+0436 u+0435 u+043E
>         u+043D u+0438 u+043D u+0435 u+0433 u+043E u+0432 u+043E u+0440
>         u+044F u+0442 u+043F u+043E u+0440 u+0443 u+0441 u+0441 u+043A
>         u+0438
>         Punycode: b1abfaaepdrnnbgefbaDotcwatmq2g4l
>
> The presence of the upper-case 'D' (not to say the string 'Dot' :-)
> is confusing, because it seems completely arbitrary.  There is no
> upper-case letter in the Cyrillic string.

> How did the upper-case D get in there?

It corresponds to the uppercase U in one of the code points in the u+
notation.  The sample Punycode implementation uses the case of the u
as a 1-bit annotation.

I see. I don't think this is a very good idea to use the U+ for
distinction, for the following reasons:

1) The u+ -> lower case, U+ -> upper case is not documented anywhere
   in the punycode draft (or at least I didn't find it). If used at
   all, it should be documented straight at the start of the examples.

2) The above convention is very easy to overlook, in particular because
   u+ and U+ look so very similar. It is close to a widely established
   convention, but differs slightly.

3) Punycode can be used in different ways, on mixed strings, on
   lc strings that still contain the original casing info, and
   on pure lc strings. Maybe there should be separate examples
   for all these three uses.

Regards,   Martin.

Re: [idn] Punicode: Upper-case in example

Reply via email to