Am 14.01.2011 08:00, schrieb Nick Sabalausky:
"Nick Sabalausky"<[email protected]> wrote in message
news:[email protected]...
"Andrei Alexandrescu"<[email protected]> wrote in message
news:[email protected]...
On 1/13/11 10:26 PM, Nick Sabalausky wrote:
[snip]
[ 'f', {u with the umlaut}, 'n', 'f' ]
Or:
[ 'f', 'u', {umlaut combining character}, 'n', 'f' ]
Those *both* get rendered exactly the same, and both represent the same
four-letter sequence. In the second example, the 'u' and the {umlaut
combining character} combine to form one grapheme. The f's and n's just
happen to be single-code-point graphemes.
Note that while some characters exist in pre-combined form (such as the
{u
with the umlaut} above), legend has it there are others than can only be
represented using a combining character.
It's also my understanding, though I'm not certain, that sometimes
multiple
combining characters can be used together on the same "root" character.
Thanks. One further question is: in the above example with u-with-umlaut,
there is one code point that corresponds to the entire combination. Are
there combinations that do not have a unique code point?
My understanding is "yes". At least that's what I've heard, and I've never
heard any claims of "no". I don't know of any specific ones offhand,
though. Actually, it might be possible to use any combining character with
any old letter or number (like maybe a 7 with an umlaut), though I'm not
certain.
FWIW, the Wikipedia article might help, or at least link to other things
that might help: http://en.wikipedia.org/wiki/Combining_character
Michel or spir might have better links though.
Heh, as if that wasn't bad enough, there's also digraphs which, from what I
can tell, seem to be single code-points that represent more than one
glyph/character/grapheme:
http://en.wikipedia.org/wiki/Digraph_(orthography)#Digraphs_in_Unicode
This page may be helpful too:
http://en.wikipedia.org/wiki/Precomposed_character
OMG, this is really fucked up.
Can't we just go back to 8bit charsets like ISO 8859-* etc? :/