On 01/14/2011 07:44 AM, Nick Sabalausky wrote:
"Andrei Alexandrescu"<[email protected]>  wrote in message
news:[email protected]...
On 1/13/11 10:26 PM, Nick Sabalausky wrote:
[snip]
[ 'f', {u with the umlaut}, 'n', 'f' ]

Or:

[ 'f', 'u', {umlaut combining character}, 'n', 'f' ]

Those *both* get rendered exactly the same, and both represent the same
four-letter sequence. In the second example, the 'u' and the {umlaut
combining character} combine to form one grapheme. The f's and n's just
happen to be single-code-point graphemes.

Note that while some characters exist in pre-combined form (such as the
{u
with the umlaut} above), legend has it there are others than can only be
represented using a combining character.

It's also my understanding, though I'm not certain, that sometimes
multiple
combining characters can be used together on the same "root" character.

Thanks. One further question is: in the above example with u-with-umlaut,
there is one code point that corresponds to the entire combination. Are
there combinations that do not have a unique code point?


My understanding is "yes". At least that's what I've heard, and I've never
heard any claims of "no". I don't know of any specific ones offhand, though.
Actually, it might be possible to use any combining character with any old
letter or number (like maybe a 7 with an umlaut), though I'm not certain.

The problem is then whether a font knows how to display it. My usual fonts (DejaVu series, pretty good with Unicode) show:
meaning they do not know how to combine digits with diacritics (they do it well with other rather strange combinations.)

But: one of the relevant advantages of decomposed forms is that when they don't know the character, they can still show at least the component marks, here '7' & '~'. Which is better than nothing for a user who knows the scripting system. If I try to display for instance a _precomposed_ syllable from a language my font does not know, i will get instead either a little square with the codepoint written inside in minuscules digits, or a placeholder like inversed-video "?".


denis
_________________
vita es estrany
spir.wikidot.com

Reply via email to