DA Shetland wrote:
In fact SHY is the character that is not a character - it is a one character size processing instruction that happens to enjoy a code point in character tables, but strictly speaking, it doesn't even need a glyph (we have a code point for the hyphen).
There are very many "characters" of this kind. Like the RTL and LTR markers and combining characters.
In every case, hyphenation available on or not, turned on or not, the SHY needs to disappear from the string - of course, if hyphenation is available, the location needs to be remembered for later use.
It happens that the Unicode consortium defines this "glyph" as a glyph that should indeed not appear (this is a change between Unicode 3 to Unicode 4). But the "of course" in your sentence is not so obvious as it may seem. This article shows an in-depth coverage of the SOFT HYPHEN: http://www.cl.cam.ac.uk/~mgk25/ucs/L2/03155r-kuhn-soft-hyphen.pdf.
In short: the soft hyphen should not be removed from the data stream. However, it should not appear in the sentence, unless the sentence must be hyphenated due to a line break.
To support this, the SOFT HYPHEN is a Cf (Other, format) group character that normally has no visible appearance, unless in special situations. It also does not count as a character when the numbers of characters is counted, nor does it appear in comparison functions (it is ignored).
-- Abel --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
