> > David Hopwood wrote: > > Soobok Lee wrote: > > > Now that <I><dot-above> is downcased to <i> as an exceptional > case, > > > Then, we have an interesting question: > > > which direction should we lowercase <I><dot-above><acute> > into ? > > > > To <i acute>. That is, the equivalence class is: > > > > <I><dot-above><acute> U+0049 U+0307 > U+0301 > > <I dot-above><acute> U+0130 U+0301 > > <I><acute> U+0049 U+0301 > > <I acute> U+00CD > > <i><acute> U+0069 U+0301 > > <i acute> U+00ED > > <dotless i><acute> U+0131 U+0301 > > <fullwidth I><acute> U+FF29 U+0301 > > <fullwidth I><dot-above><acute> U+FF29 U+0307 > U+0301 > > <fullwidth i><acute> U+FF49 U+0301 > > > > and if NFKC is used, also: [snip] > > > > <i acute> U+00ED is the normalised representative for all of > these. > > > > <i><dot-above><acute> is in a different equivalence class (AFAIK, > no > > language uses it, so this doesn't matter). > > My mistake; it is used in Lithuanian. The Lithuanian usage would > argue > for <i><dot-above><acute> being in the same equivalence class (since > its > Lithuanian uppercase form is <I acute>). So, another solution that > should be considered is to use NFC o fold as in the current version > of > stringprep, but map out U+0307 whenever it is attached to a > character > based on 'i' or 'I'. That wouldn't cause any problems for Turkish or > Azeri. I'll list all the options in another post. >
This is a good demostration case, that when we deal with symbol usages cross different locality, a procedure like NFC will overlook something. It is better to list all the input and output characters in a table for easy checking, and easy to understand what are in an equivalent set. However, something like NFC is necessary as a tool to check consistence and as a guide to form such a table. Liana
