I read in http://www.unicode.org/Public/3.2-Update/CaseFolding-3.2.0.txt

<quote>
0130; F; 0069 0307; # LATIN CAPITAL LETTER I WITH DOT ABOVE
0130; T; 0069; # LATIN CAPITAL LETTER I WITH DOT ABOVE

# F: full case folding, mappings that cause strings to grow in length. Multiple 
characters are separated by spaces.
# T: special case for uppercase I and dotted uppercase I
#    - For non-Turkic languages, this mapping is normally not used.
#    - For Turkic languages (tr, az), this mapping can be used instead of the normal 
mapping for these characters.
</quote>

if we choose  F,
  <I dotabove> and <I><dot above> are casefolded into the same <i><dot above>

If we choose  T, we get different outputs.
  <I dot above> --> <i>,  <I><dot above>  --> <i><dot above>

Even Option F makes this trouble:

 In Turkic language, <I dotabove> and <i> form the bicameral pair.
 If turkish people enter an IDN  ???<I dotabove>.com, they could not reach  ???<i>.com.
 At this very point, locale-independence objective of Stringprep casefolding is not 
fulfilled.
 <i><dotabove> and <i> should be unified into either of them in any locale-independent
  casefolding. You can find about "locale-independce/non-contextual" objectives in 
UAX#21,
  in the example of <I> -> <dotless small i> --> <i> casefolding. <dotless i> and <i> 
are *NOT* the
  bicameral pair, but for transitive case-insensitive equivalence, those two are 
unified into <i> in UAX#21.

Stringprep should address this issue, and
Next CaseFOlding-3.2.?.txt should clarify more about the rquired 
locale-indepenent/non-contextual
casefoldings for <I dotabove>.


Soobok Lee

----- Original Message -----
From: "Mark Davis" <[EMAIL PROTECTED]>
To: "Dan Oscarsson" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>; 
<[EMAIL PROTECTED]>;
<[EMAIL PROTECTED]>
Sent: Tuesday, May 07, 2002 12:37 AM
Subject: Re: [idn] 1st stringprep issue: not answered and ignored


> The Unicode Consortium recommends that the tables in StringPrep be
> updated to encompass Unicode 3.2, which was released in March.
>
> As a part of this release, there was one change (in addition to new
> characters) in case folding. The situation regarding the
> dotted/dotless I in the case foldings has been cleaned up by providing
> several options, one of which (full case folding without option T)
> preserves canonical equivalence (although not normalization forms --
> text still needs to be normalized after case folding).
>
> http://www.unicode.org/Public/3.2-Update/CaseFolding-3.2.0.txt
>
> Mark
> __________
>
> http://www.macchiato.com
>
> "Eppur si muove"
>
> ----- Original Message -----
> From: "Dan Oscarsson" <[EMAIL PROTECTED]>
> To: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>;
> <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
> Sent: Monday, May 06, 2002 01:57
> Subject: Re: [idn] 1st stringprep issue: not answered and ignored
>
>
> >
> > The point that Soobok Lee shows is a very serious matter.
> > The requirement on the ACE form of IDNA is that the same
> > name must always result in the same ACE!!!!
> >
> > If doing casefolding/mapping followed by NFKC results in a
> > different code point sequence than doing NFC, casefolding/mapping
> and
> > NFKC again, we will get DNS lookup failures due to names do not
> > match. While hopefully most data entered into stringprep will
> > be NFC, some will not.
> >
> > If the above is true, stringprep/nameprep must be changed so that
> > the preparation steps for strings are:
> >
> > 1) See to that input strings is NFC.
> >
> > 2) all the steps in stringprep.
> >
> >
> >     Dan
> >
> > --
> > Below i Soobok Lee's text:
> > >UTC casefolding (UAX21) is made for char-by-char casefolding,  not
> for
> > >combining sequences, but stringprep blindly applies UAX21 into
> them.
> > >That is not the problem of UAX21, rather  of the stringprep.
> > >
> > >NFCing before casefolding solves this problem, but this suggestion
> > >was also ignored or not discussed in depth.
> > >
> > >Without any modificationa to UAX21 and NFKC and NFC, we could cure
> > >this <I><dot above> stringprep errors, simply by adding NFC in
> > >step zero in stringprep.
> >
> >


Reply via email to