Abel Braaksma wrote:
DA Shetland wrote:
In fact SHY is the character that is not a character - it is a one character size processing instruction that happens to enjoy a code point in character tables, but strictly speaking, it doesn't even need a glyph (we have a code point for the hyphen).

There are very many "characters" of this kind. Like the RTL and LTR markers and combining characters.
Yes. But all the ones I can think of pretty much look like the control characters or sequences they are. With the SHY, you have to really look carefully to realize that it is one of "those." Then look again (and again :-) to figure out the deeper meaning(s).

In every case, hyphenation available on or not, turned on or not, the SHY needs to disappear from the string - of course, if hyphenation is available, the location needs to be remembered for later use.

It happens that the Unicode consortium defines this "glyph" as a glyph that should indeed not appear (this is a change between Unicode 3 to Unicode 4). But the "of course" in your sentence is not so obvious as it may seem. This article shows an in-depth coverage of the SOFT HYPHEN: http://www.cl.cam.ac.uk/~mgk25/ucs/L2/03155r-kuhn-soft-hyphen.pdf.
Thank you!  Excellent summary.

In short: the soft hyphen should not be removed from the data stream. However, it should not appear in the sentence, unless the sentence must be hyphenated due to a line break.

To support this, the SOFT HYPHEN is a Cf (Other, format) group character that normally has no visible appearance, unless in special situations. It also does not count as a character when the numbers of characters is counted, nor does it appear in comparison functions (it is ignored).
Again, thank you for the very helpful extension and corrections of my thoughts, which were, as usual, drifting into a far too implementation orientation. My saying that strings have to be de-SHYed up front was as much as saying that I don't trust the presentation layer to do what it needs to do for all categories of control characters, combining characters, etc. of which SHY is one.

So I will pre-filter and watch for news.

-- Abel


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to