Peter Kirk
Thu, 20 Nov 2003 06:58:48 -0800
...I wasn't thinking of any specific combining character. But I was thinking of the general principle that if one wants to display an isolated diacritic glyph, which is possible in principle, at least in paradigm lists (and code charts!), for any of the characters you list above, the recommended way of doing so is to apply them to SP or NBSP. Unfortunately there are many problems and undesirable side effects of this recommendation.
This trick doesn't work if any of the CC's are in combining class zero.
Of course, but which combining character of combining class 0 does need to combine with NBSP in a way that affect renderers?
Do you think about sequences like <NBSP,CGJ>?
Or about issues when rendering <07A6;THAANA ABAFILI;Mn;0;NSM;;;;;N;;;;;> after <NBSP> which of wourse would be handled only as <WJ,SP,WJ,THAANA ABAFILI> ?
Or about: <0901;DEVANAGARI SIGN CANDRABINDU;Mn;0;NSM;;;;;N;;;;;> after <NBSP> rendered as if it was <WJ,SP,WJ,CANDRABINDU> ?
Or about <0903;DEVANAGARI SIGN VISARGA;Mc;0;L;;;;;N;;;;;> after <NBSP> which is this time a "Mc" character ?
Or about all the Indic vowels which do not seem to be really combining on NBSP but would be rendered as a space followed by a defective isolated form of the vowel (so without vowel glyphs reordering around the space) ?
Just curious...
If we just say that <NBSP> behaves in all cases in renderers as if it wasWell, WJ itself is actually LJ, because, astonishingly, it does not prohibit word breaks (see UAX29). Similarly ZWNBS, ZWJ, and ZWNJ. As format characters these are ignored when finding word breaks. The implication is that <A,B,WJ,C,D> is a single word, but <A,B,WJ,SPACE,WJ,C,D> and <A,B,WJ,$,WJ,C,D> are both two words despite the obvious attempt to use WJ to force these to be understood as one word (and despite the existence of alphabets in which "$" is considered alphabetic).
<WJ,SP,WJ> where WJ is reordered with a pseudo-combining class 256, it
solves much problems with the interpretation of NBSP, and it looks like if
NBSP was a space letter; however NBSP is not a "Lo" character but really a
"Zs" whitespace and thus justifiable out of the end margin; also NBSP does
not prohibit word break but only line breaks), so it is more like if it was
in fact: <LJ,SP,LJ> where LJ is a line-joiner, distinct also from ZWJ
(zero-width joiner) used to hint ligatures but which does not brohibit any
break.
-- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/