On Friday, June 03, 2016 03:08:43 Walter Bright via Digitalmars-d wrote: > On 6/3/2016 1:05 AM, H. S. Teoh via Digitalmars-d wrote: > > At the time > > Unicode also had to grapple with tricky issues like what to do with > > lookalike characters that served different purposes or had different > > meanings, e.g., the mu sign in the math block vs. the real letter mu in > > the Greek block, or the Cyrillic A which looks and behaves exactly like > > the Latin A, yet the Cyrillic Р, which looks like the Latin P, does > > *not* mean the same thing (it's the equivalent of R), or the Cyrillic В > > whose lowercase is в not b, and also had a different sound, but > > lowercase Latin b looks very similar to Cyrillic ь, which serves a > > completely different purpose (the uppercase is Ь, not B, you see). > > I don't see that this is tricky at all. Adding additional semantic meaning > that does not exist in printed form was outside of the charter of Unicode. > Hence there is no justification for having two distinct characters with > identical glyphs. > > They should have put me in charge of Unicode. I'd have put a stop to much of > the madness :-)
Actually, I would argue that the moment that Unicode is concerned with what the character actually looks like rather than what character it logically is that it's gone outside of its charter. The way that characters actually look is far too dependent on fonts, and aside from display code, code does not care one whit what the character looks like. For instance, take the capital letter I, the lowercase letter l, and the number one. In some fonts that are feeling cruel towards folks who actually want to read them, two of those characters - or even all three of them - look identical. But I think that you'll agree that those characters should be represented as distinct characters in Unicode regardless of what they happen to look like in a particular font. Now, take a cyrllic letter that looks similar to a latin letter. If they're logically equivalent such that no code would ever want to distinguish between the two and such that no font would ever even consider representing them differently, then they're truly the same letter, and they should only have one Unicode representation. But if anyone would ever consider them to be logically distinct, then it makes no sense for them to be considered to be the same character by Unicode, because they don't have the same identity. And that distinction is quite clear if any font would ever consider representing the two characters differently, no matter how slight that difference might be. Really, what a character looks like has nothing to do with Unicode. The exact same Unicode is used regardless of how the text is displayed. Rather, what Unicode is doing is providing logical identifiers for characters so that code can operate on them, and display code can then do whatever it does to display those characters, whether they happen to look similar or not. I would think that the fact that non-display code does not care one whit about what a character looks like and that display code can have drastically different visual representations for the same character would make it clear that Unicode is concerned with having identifiers for logical characters and that that is distinct from any visual representation. - Jonathan M Davis