On Fri, Jun 03, 2016 at 10:14:15AM +0000, Vladimir Panteleev via Digitalmars-d 
wrote:
> On Friday, 3 June 2016 at 10:08:43 UTC, Walter Bright wrote:
> > On 6/3/2016 1:05 AM, H. S. Teoh via Digitalmars-d wrote:
> > > At the time Unicode also had to grapple with tricky issues like
> > > what to do with lookalike characters that served different
> > > purposes or had different meanings, e.g., the mu sign in the math
> > > block vs. the real letter mu in the Greek block, or the Cyrillic A
> > > which looks and behaves exactly like the Latin A, yet the Cyrillic
> > > Р, which looks like the Latin P, does *not* mean the same thing
> > > (it's the equivalent of R), or the Cyrillic В whose lowercase is в
> > > not b, and also had a different sound, but lowercase Latin b looks
> > > very similar to Cyrillic ь, which serves a completely different
> > > purpose (the uppercase is Ь, not B, you see).
> > 
> > I don't see that this is tricky at all. Adding additional semantic
> > meaning that does not exist in printed form was outside of the
> > charter of Unicode. Hence there is no justification for having two
> > distinct characters with identical glyphs.
> 
> That's not right either. Cyrillic letters can look slightly different
> from their latin lookalikes in some circumstances.
> 
> I'm sure there are extremely good reasons for not using the latin
> lookalikes in the Cyrillic alphabets, because most (all?) 8-bit
> Cyrillic encodings use separate codes for the lookalikes. It's not
> restricted to Unicode.

Yeah, lowercase Cyrillic П is п, which looks like lowercase Greek π in
some fonts, but in cursive form it looks more like Latin lowercase n.
It wouldn't make sense to encode Cyrillic п the same as Greek π or Latin
lowercase n just by appearance, since logically it stands as its own
character despite its various appearances.  But it wouldn't make sense
to encode it differently just because you're using a different font!
Similarly, lowercase Cyrillic т in some cursive fonts looks like
lowercase Latin m.  I don't think it would make sense to encode
lowercase Т as Latin m just because of that.

Eventually you have no choice but to encode by logical meaning rather
than by appearance, since there are many lookalikes between different
languages that actually mean something completely different, and often
behaves completely differently.


T

-- 
People say I'm indecisive, but I'm not sure about that. -- YHL, CONLANG

Reply via email to