On Friday, June 03, 2016 15:38:38 Walter Bright via Digitalmars-d wrote: > On 6/3/2016 2:10 PM, Jonathan M Davis via Digitalmars-d wrote: > > Actually, I would argue that the moment that Unicode is concerned with > > what > > the character actually looks like rather than what character it logically > > is that it's gone outside of its charter. The way that characters > > actually look is far too dependent on fonts, and aside from display code, > > code does not care one whit what the character looks like. > > What I meant was pretty clear. Font is an artistic style that does not > change context nor semantic meaning. If a font choice changes the meaning > then it is not a font.
Well, maybe I misunderstood what was being argued, but it seemed like you've been arguing that two characters should be considered the same just because they look similar, whereas H. S. Teoh is arguing that two characters can be logically distinct while still looking similar and that they should be treated as distinct in Unicode because they're logically distinct. And if that's what's being argued, then I agree with H. S. Teoh. I expect - at least ideally - for Unicode to contain identifiers for characters that are distinct from whatever their visual representation might be. Stuff like fonts then worries about how to display them, and hopefully don't do stupid stuff like make a capital I look like a lowercase l (though they often do, unfortunately). But if two characters in different scripts - be they latin and cyrillic or whatever - happen to often look the same but would be considered two different characters by humans, then I would expect Unicode to consider them to be different, whereas if no one would reasonably consider them to be anything but exactly the same character, then there should only be one character in Unicode. However, if we really have crazy stuff where subtly different visual representations of the letter g are considered to be one character in English and two in Russian, then maybe those should be three different characters in Unicode so that the English text can clearly be operating on g, whereas the Russian text is doing whatever it does with its two characters that happen to look like g. I don't know. That sort of thing just gets ugly. But I definitely think that Unicode characters should be made up of what the logical characters are and leave the visual representation up to the fonts and the like. Now, how to deal with uppercase vs lowercase and all of that sort of stuff is a completely separate issue IMHO, and that comes down to how the characters are somehow logically associated with one another, and it's going to be very locale-specific such that it's not really part of the core of Unicode's charter IMHO (though I'm not sure that it's bad if there's a set of locale rules that go along with Unicode for those looking to correctly apply such rules - they just have nothing to do with code points and graphemes and how they're represented in code). - Jonathan M Davis
