Major Defect in Combining Classes of Tibetan Vowels

Christopher John Fynn Sat, 21 Jun 2003 18:57:11 -0700

In Unicode's UnicodeData.txt   (
 http://www.unicode.org/Public/UNIDATA/Unicodea.Dattxt )
 0F7E has a Canonical Combining Class Value (CCCV) of 0;
 0F71 a CCCV of 129;
 0F72 0F7A 0F7B 0F7C 0F7D and 0F80 a CCCV of 130;
 0F74 a CCCV of 132;
 and 0F82 and 0F83 have a CCCV of 230.


 By normal Tibetan & Dzongkha spelling, writing, and input rules
 Tibetan script stacks should be entered and written: 1 headline
 consonant (0F40-0F6A), any  subjoined consonant(s) (0F90-
 0F9C),  achung (0F71), shabkyu (0F74), any above headline
 vowel(s) (0F72 0F7A 0F7B 0F7C 0F7D and 0F80) ; any ngaro (0F7E,
 0F82 and 0F83)

 So following normal Tibetan & Dzongkha input and spelling rules
 the relative ordering of these characters should be:
 A.  0F71
 B.  0F74
 C.  0F72 0F7A 0F7B 0F7C 0F7D and 0F80
 D.  0F7E,  0F82 and 0F83

 The fact that, in a process of "canonical decomposition" or
 "normalisation",  these combining characters can get reordered
 in a bizarre order relative to each other causes difficulties
 with culturally correct collation (where  0F7E,  0F82 and 0F83
 should have an equal value) - and especially it necessitates
 making lookups in smart fonts far more complex and inefficient
 than they should have to be.

 (In Tibetan script  fonts 0F71 and 0F74 are often  ligated with
 preceding consonant (+ subjoined consonants) combined as a
 single glyph whereas above headline vowels are almost always
 treated as non spacing combining marks.)

 Currently there seems to be no easy or standardized work around
 for these problems and the standard seems to say that the
 relative values of assigned Canonical Combining Class Values
 cannot be changed.

 Any suggestions as to how to create a standardized work around
 for these incorrect values?

 - Chris

Major Defect in Combining Classes of Tibetan Vowels

Reply via email to