Rick McGowan <[EMAIL PROTECTED]> has privately suggested moving
the discussion of  Combining Classes of *Tibetan* Characters
from the main Unicode list [EMAIL PROTECTED] to the TIBEX list
[EMAIL PROTECTED] - an "experts" list which was set up several
years ago specifically to discuss proposals for encoding Tibetan
characters in Unicode.  If there are people  who have a
particular interest in Tibetan characters and have been
following the thread here who would like to continue following
this thread - perhaps they could ask Rick how they can join that
list.

I'll follow Rick's advice - perhaps this discussion is more
appropriate on the TIBEX list - even though similar issues with
some Hebrew characters which have been raised here (again) as a
result of this thread makes me think there may be a need for a
non script specific solution or work-around to problems with
cannoical combining class values.

Anyway I'm going to move this discussion over there with a
parting shot...

Off-list Robert Chilton has pointed out to me the following:

> 3. A very common occasion of 0F7E occurring with a vowel is in
the stack
> HaUm (orthographic sequence of 0F67 0F71 0F74 0F7E).  Because
0F7E is
> currently assigned a cc of zero, this *same glyph-form* could
> theoretically be encoded with a total of 6 different character
> sequences, resulting in 4(!) different sequences following
> normalization.  Properly, all 6 sequences should normalize to
the same
> sequence -- which is indeed the case if 0F82 or 0F83 is used
in place of
> 0F7E.  Obviously a major problem, not only for rendering but
also for
> searching and sorting.

FOUR different sequences possible *after* "normalisation" ???

Personally I would have rather seen all Tibetan characters
having a CCV of 0 (and all pre-combined Tibetan characters
*strongly* depreciated)rather than this. If someone simply
follows the normal rules for writing Tibetan, then characters
will be entered in a very predictable order which is far easier
to process than the one(s) they can end up in after Unicode
"normalisation".

- Chris Fynn

BTW My apologies to anyone who receives two copies of this
message.


Reply via email to