On 19/11/2003 01:49, Pim Blokland wrote:
In the online 4.0 book, chapter 15
http://www.unicode.org/versions/Unicode4.0.0/ch15.pdf
the definition for Word Joiner says:
Until Unicode 3.1.1, U+FEFF was the only code point with word
joining semantics, but because it is more commonly used as
byte order mark, the use of U+2060 [word joiner] to indicate
word joining is strongly preferred for any new text.
Perhaps this depends what is meant by "word joining semantics". I would
presume this to imply that a word boundary is not permitted at this
point, but in fact on the current definitions in UAX29
(http://www.unicode.org/reports/tr29/tr29-5.html) ZWNBS, WJ and NBSP are
all treated as word boundary characters.
However, a couple of paragraphs up, the definition for No-Break
Space says:
U+00A0 [No-Break Space] behaves like the following coded
character sequence: U+FEFF [Zero Width No-Break Space] +
U+0020 [Space] + U+FEFF [Zero Width No-Break Space].
Is this something that has slipped by the editors? Or am I missing
something?
Pim Blokland
Does this equivalence hold when combining characters are applied to the
NBSP? Is the sequence <NBSP, CC> (recommended for spacing diacritics,
where CC is any sequence of combining characters) equivalent to <ZWNBS,
SP, ZWNBS, CC>? Or should the equivalence be to <ZWNBS, SP, CC, ZWNBS>?
Is it legal to combine combining characters with ZWNBS, or WJ, and how
should this be rendered?
--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/