Abel Braaksma wrote:
DA Shetland wrote:
In fact SHY is the character that is not a character - it is a one
character size processing instruction that happens to enjoy a code
point in character tables, but strictly speaking, it doesn't even
need a glyph (we have a code point for the hyphen).
There are very many "characters" of this kind. Like the RTL and LTR
markers and combining characters.
Yes. But all the ones I can think of pretty much look like the control
characters or sequences they are. With the SHY, you have to really look
carefully to realize that it is one of "those." Then look again (and
again :-) to figure out the deeper meaning(s).
In every case, hyphenation available on or not, turned on or not, the
SHY needs to disappear from the string - of course, if hyphenation is
available, the location needs to be remembered for later use.
It happens that the Unicode consortium defines this "glyph" as a glyph
that should indeed not appear (this is a change between Unicode 3 to
Unicode 4). But the "of course" in your sentence is not so obvious as
it may seem. This article shows an in-depth coverage of the SOFT
HYPHEN:
http://www.cl.cam.ac.uk/~mgk25/ucs/L2/03155r-kuhn-soft-hyphen.pdf.
Thank you! Excellent summary.
In short: the soft hyphen should not be removed from the data stream.
However, it should not appear in the sentence, unless the sentence
must be hyphenated due to a line break.
To support this, the SOFT HYPHEN is a Cf (Other, format) group
character that normally has no visible appearance, unless in special
situations. It also does not count as a character when the numbers of
characters is counted, nor does it appear in comparison functions (it
is ignored).
Again, thank you for the very helpful extension and corrections of my
thoughts, which were, as usual, drifting into a far too implementation
orientation. My saying that strings have to be de-SHYed up front was as
much as saying that I don't trust the presentation layer to do what it
needs to do for all categories of control characters, combining
characters, etc. of which SHY is one.
So I will pre-filter and watch for news.
-- Abel
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]