At 05:25 AM 7/18/2004, Peter Kirk wrote:
I accept that there might be some script-specific cases in which
particular accents should not be removed. The breve in Cyrillic i kratkoe
might be an example; but then this might be rather too language-specific
as well. But these should be clearly defined and justified exceptions,
rather than their possible existence being a reason to restrict the
general applicability of accent and diacritic folding.
I was thinking rather more of Khmer, where a some characters that are
considered letters are given gc=Mn. In that case, folding would be very
inappropriate.
So the answer has to be to limit the removal of diacritical marks in
AccentFolding, to those that are truly *accents*. That's a subset of gc=Mn.
There are two options for a starting set:
select all 'accents' (note, not baseforms) that occur in some precomposed
character. And then add additional ones on a case by case basis (e.g.
stroke overlay).
Or, start with all gc=Mn from the 0300 and 1DC0 blocks (the latter will be
part of 4.1), and make some principled additions / deletions.
All script-specific non-spacing marks for Indic scripts etc; should not be
part of 'AccentFolding', in my opinion.
.. when I look more closely at AccentFolding as defined I see a problem
with it. It is specified as affecting only "Latin/Greek/Cyrillic
characters with canonical decomposition". But this is inadequate because
there are many cases of Latin/Greek/Cyrillic characters (and most cases of
Hebrew ones) where an accent should be removed even though there is no
precomposed form encoded and so canonical decomposition
Correct. Whatever the set of combining marks is, we then need to define a
set of base characters. We could simply use sc=Latin + sc=Greek +
sc=Cyrillic as a starting set, to treat all accented character equally.
What about other scripts:
If you feel that Hebrew folding to unpointed is something that should
happen everytime other accents are folded, we can add Hebrew (or we can
make a separate fodling, HebrewMarksFolding,
that people can invoke optionally) I tend to prefer the latter. Since for
Hebrew (the languages), a folding to unpointed might be one of the foldings
that someone might want to apply permanently, it should be separtely named
and defined, on the principle that the foldings should be building blocks.
A./