On 02/04/2004 15:01, Asmus Freytag wrote:

...

Think of the example of SHY (soft hyphen), used to mark possible hyphenation
points in a word. A while ago we had a discussion on this list where there was
an interesting minimal pair of German compounds:


Wachs|tu-be  (tube of (or made of) wax)
Wach|stu-be  (guard room)

The word boundary (which is also an hyphenation point) is marked as |, a secondary
hyphentaion point is marked with -. In other word, each word has two SHYs in it,
but not both in the same location.


I can remove the SHYs from these words, and if the text is not broken across lines
at that point, its semantic for the human reader doesn't change. With context, the
text is unambiguous, but if there isn't enough context, the text is clearly ambiguous.


However, equally clearly, by leaving the SHY in the text, it is (in its internal
representation) entirely unambiguous, even if that semantic difference is not
surfaced to the reader (except if a line break fortuitously happens to be present
in the first half of the word).


Of course a (good) screen reader could pick up on the difference and split the
compound correctly when pronouncing it.


Interesting. But suppose the typesetting rules for German were changed so that hyphenation is no longer permitted, or so that (as in many languages) hyphenation points are determined strictly from the letters. These two words can no longer be distinguished by the position of SHY. But the good screen reader would still need to distinguish their pronunciations. Is there any type of character which could be defined, in Unicode, to preserve this distinction, but to be completely hidden in display? Perhaps some kind of zero width morpheme break character? I suppose ZWNJ or WJ could be used, but they might have other undesirable characteristics. (ZWNJ would inhibit formation of an st ligature in certain fonts in Wachs|tube, but maybe that is also desirable.)

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/




Reply via email to