I read in http://www.unicode.org/Public/3.2-Update/CaseFolding-3.2.0.txt
<quote> 0130; F; 0069 0307; # LATIN CAPITAL LETTER I WITH DOT ABOVE 0130; T; 0069; # LATIN CAPITAL LETTER I WITH DOT ABOVE # F: full case folding, mappings that cause strings to grow in length. Multiple characters are separated by spaces. # T: special case for uppercase I and dotted uppercase I # - For non-Turkic languages, this mapping is normally not used. # - For Turkic languages (tr, az), this mapping can be used instead of the normal mapping for these characters. </quote> if we choose F, <I dotabove> and <I><dot above> are casefolded into the same <i><dot above> If we choose T, we get different outputs. <I dot above> --> <i>, <I><dot above> --> <i><dot above> Even Option F makes this trouble: In Turkic language, <I dotabove> and <i> form the bicameral pair. If turkish people enter an IDN ???<I dotabove>.com, they could not reach ???<i>.com. At this very point, locale-independence objective of Stringprep casefolding is not fulfilled. <i><dotabove> and <i> should be unified into either of them in any locale-independent casefolding. You can find about "locale-independce/non-contextual" objectives in UAX#21, in the example of <I> -> <dotless small i> --> <i> casefolding. <dotless i> and <i> are *NOT* the bicameral pair, but for transitive case-insensitive equivalence, those two are unified into <i> in UAX#21. Stringprep should address this issue, and Next CaseFOlding-3.2.?.txt should clarify more about the rquired locale-indepenent/non-contextual casefoldings for <I dotabove>. Soobok Lee ----- Original Message ----- From: "Mark Davis" <[EMAIL PROTECTED]> To: "Dan Oscarsson" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Tuesday, May 07, 2002 12:37 AM Subject: Re: [idn] 1st stringprep issue: not answered and ignored > The Unicode Consortium recommends that the tables in StringPrep be > updated to encompass Unicode 3.2, which was released in March. > > As a part of this release, there was one change (in addition to new > characters) in case folding. The situation regarding the > dotted/dotless I in the case foldings has been cleaned up by providing > several options, one of which (full case folding without option T) > preserves canonical equivalence (although not normalization forms -- > text still needs to be normalized after case folding). > > http://www.unicode.org/Public/3.2-Update/CaseFolding-3.2.0.txt > > Mark > __________ > > http://www.macchiato.com > > "Eppur si muove" > > ----- Original Message ----- > From: "Dan Oscarsson" <[EMAIL PROTECTED]> > To: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>; > <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> > Sent: Monday, May 06, 2002 01:57 > Subject: Re: [idn] 1st stringprep issue: not answered and ignored > > > > > > The point that Soobok Lee shows is a very serious matter. > > The requirement on the ACE form of IDNA is that the same > > name must always result in the same ACE!!!! > > > > If doing casefolding/mapping followed by NFKC results in a > > different code point sequence than doing NFC, casefolding/mapping > and > > NFKC again, we will get DNS lookup failures due to names do not > > match. While hopefully most data entered into stringprep will > > be NFC, some will not. > > > > If the above is true, stringprep/nameprep must be changed so that > > the preparation steps for strings are: > > > > 1) See to that input strings is NFC. > > > > 2) all the steps in stringprep. > > > > > > Dan > > > > -- > > Below i Soobok Lee's text: > > >UTC casefolding (UAX21) is made for char-by-char casefolding, not > for > > >combining sequences, but stringprep blindly applies UAX21 into > them. > > >That is not the problem of UAX21, rather of the stringprep. > > > > > >NFCing before casefolding solves this problem, but this suggestion > > >was also ignored or not discussed in depth. > > > > > >Without any modificationa to UAX21 and NFKC and NFC, we could cure > > >this <I><dot above> stringprep errors, simply by adding NFC in > > >step zero in stringprep. > > > >
