At 22:21 01/07/16 +0200, Patrik F$BgM(Btstr$B�N(B wrote: >It's also the case that I only see "I support UTF-8" when a real solution >with UTF-8 would be something like IDNA, with UTF-8 as the output and not >ACE encoded strings, let's call it INDU. :-)
Yes. Actually it would be quite easy to convert the IDNA draft to an IDNU draft, more or less by replacing ACE with UTF-8. Of course a second pass to smoothen some edges may be required. >Because of the fact that some software already do UTF-8 (without nameprep), >and people on this list which should know better say "UTF-8 and not ACE" >when they should point at the actual algorithm used, I am extremely worried >that what I see on this list is IDNA (which include nameprep) or "just send >UTF-8 without nameprep". In reality that is. Just send UTF-8 with nameprep is definitely what's needed, much better than just send UTF-8 without nameprep. >If not, explain how the software which today send UTF-8 on the wire will >stop doing that the day we release the IDNU proposal? Who will suffer and >discover what software do nameprep and not? This software will most probably be upgraded. If it's browsers, I'll definitely contribute by talking to the right people. I'm very sure browser vendors prefer UTF-8 with nameprep over ACE with nameprep. Anyway, I think that nameprep is in various ways very important, but on the other hand, its importance has also been highly overestimated. So the amount of suffering is much more limited than it may seem. It is very important to understand that in most contexts, the chance that a label is changed (except for lowercasing, which was part of the UTF-8 proposal from the start) by nameprep when somebody takes a proper domain name from paper and inputs it, is *very* small. There are some well-known and important exceptions such as half-width/full-width kana, and some implementations of Vietnamese (windows-1258), but for many areas of the world, nameprep, in its relevant parts, just conforms what's done anyway. The reason for this is that this property is *designed* into NFC, and that most of NFK (but not all of it) is garbage collection and includes many things that are in some way similar but that the user would never want to type in (because they really look different from the real thing), and in addition are difficult to type; a lot of examples can e.g. be found in blocks U+32xx and U+33xx). So what we very much need is a very clear definition of which names are acceptable and which names are not acceptable, and a high-enough checking/compliance rate (somewhere between 50% and 90%) on the request side to put enough pressure on the registry side to make that side of compliance 100%. What is also quite beneficial is to have some clear guidelines as to what characters should be mapped to others before lookup and what not. Half-width/full-width is a typical example. But currently, we are e.g. forbidding somebody who makes Turkish software to use the case mapping that a Turkish user would expect, just because we want to avoid problems if ever a user without an idea about Turkish casing rules uses that software. The current tendencies of 'better check once too much than not enough times' (to which I agree in principle) and 'better uniform and sometimes wrong than according to user expectations' (about which I have serious doubts) seem to have lead us to overshoot our goals. Regards, Martin.
