Hello Adam, Many thanks for your quick response.
At 22:50 02/11/26 +0000, Adam M. Costello wrote:
Martin Duerst <[EMAIL PROTECTED]> wrote: > In http://www.ietf.org/internet-drafts/draft-ietf-idn-punycode-03.txt, > example (I) says: > > (I) Russian (Cyrillic): > U+043F u+043E u+0447 u+0435 u+043C u+0443 u+0436 u+0435 u+043E > u+043D u+0438 u+043D u+0435 u+0433 u+043E u+0432 u+043E u+0440 > u+044F u+0442 u+043F u+043E u+0440 u+0443 u+0441 u+0441 u+043A > u+0438 > Punycode: b1abfaaepdrnnbgefbaDotcwatmq2g4l > > The presence of the upper-case 'D' (not to say the string 'Dot' :-) > is confusing, because it seems completely arbitrary. There is no > upper-case letter in the Cyrillic string.
> How did the upper-case D get in there? It corresponds to the uppercase U in one of the code points in the u+ notation. The sample Punycode implementation uses the case of the u as a 1-bit annotation.
I see. I don't think this is a very good idea to use the U+ for distinction, for the following reasons: 1) The u+ -> lower case, U+ -> upper case is not documented anywhere in the punycode draft (or at least I didn't find it). If used at all, it should be documented straight at the start of the examples. 2) The above convention is very easy to overlook, in particular because u+ and U+ look so very similar. It is close to a widely established convention, but differs slightly. 3) Punycode can be used in different ways, on mixed strings, on lc strings that still contain the original casing info, and on pure lc strings. Maybe there should be separate examples for all these three uses. Regards, Martin.
