"spir" <[email protected]> wrote in message news:[email protected]... > > If anyone finds a pointer to such an explanation, bravo, and than you. > (You will certainly not find it in Unicode literature, for instance.) > Nick's explanation below is good and concise. (Just 2 notes added.)
Yea, most Unicode explanations seem to talk all about "code-units vs code-points" and then they'll just have a brief note like "There's also other things like digraphs and combining codes." And that'll be all they mention. You're right about the Unicode literature. It's the usual standards-body documentation, same as W3C: "Instead of only some people understanding how this works, lets encode the documentation in legalese (and have twenty only-slightly-different versions) to make sure that nobody understands how it works." > You can also say there are 2 kinds of characters: simple like "u" & > composite "ü" or "ü??". The former are coded with a single (base) code, > the latter with one (rarely more) base codes and an arbitrary number of > combining codes. Couple questions about the "more than one base codes": - Do you know an example offhand? - Does that mean like a ligature where the base codes form a single glyph, or does it mean that the combining code either spans or operates over multiple glyphs? Or can it go either way? > For a majority of _common_ characters made of 2 or 3 codes (western > language letters, korean Hangul syllables,...), precombined codes have > been added to the set. Thus, they can be coded with a single code like > simple characters. > Out of curiosity, how do decomposed Hangul characters work? (Or do you know?) Not actually knowing any Korean, my understanding is that they're a set of 1 to 4 phoenetic glyphs that are then combined into one glyph. So, it is like a series of base codes that automatically combine, or are there combining characters involved? > [Also note, to avoid things be too simple ;-), some (few) combining codes > called "prepend" come _before_ the base in raw code sequence...] > Fun!
