On 19/11/2003 01:49, Pim Blokland wrote:

In the online 4.0 book, chapter 15

http://www.unicode.org/versions/Unicode4.0.0/ch15.pdf

the definition for Word Joiner says:



Until Unicode 3.1.1, U+FEFF was the only code point with word
joining semantics, but because it is more commonly used as
byte order mark, the use of U+2060 [word joiner] to indicate
word joining is strongly preferred for any new text.





Perhaps this depends what is meant by "word joining semantics". I would presume this to imply that a word boundary is not permitted at this point, but in fact on the current definitions in UAX29 (http://www.unicode.org/reports/tr29/tr29-5.html) ZWNBS, WJ and NBSP are all treated as word boundary characters.

However, a couple of paragraphs up, the definition for No-Break
Space says:



U+00A0 [No-Break Space] behaves like the following coded
character sequence: U+FEFF [Zero Width No-Break Space] +
U+0020 [Space] + U+FEFF [Zero Width No-Break Space].



Is this something that has slipped by the editors? Or am I missing something?

Pim Blokland


Does this equivalence hold when combining characters are applied to the NBSP? Is the sequence <NBSP, CC> (recommended for spacing diacritics, where CC is any sequence of combining characters) equivalent to <ZWNBS, SP, ZWNBS, CC>? Or should the equivalence be to <ZWNBS, SP, CC, ZWNBS>? Is it legal to combine combining characters with ZWNBS, or WJ, and how should this be rendered?

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/





Reply via email to