Re: The Case Against Autodecode

Alix Pexton via Digitalmars-d Sat, 04 Jun 2016 02:51:07 -0700

On 03/06/2016 20:12, Dmitry Olshansky wrote:

On 02-Jun-2016 23:27, Walter Bright wrote:

I wonder what rationale there is for Unicode to have two different
sequences of codepoints be treated as the same. It's madness.


Yeah, Unicode was not meant to be easy it seems. Or this is whatever
happens with evolutionary design that started with "everything is a
16-bit character".

Typing as someone who as spent some time creating typefaces, having tworepresentations makes sense, and it didn't start with Unicode, itstarted with movable type.

It is much easier for a font designer to create the two codepointversions of characters for most instances, i.e. make the base lettersand the diacritics once. Then what I often do is make single codepointversions of the ones I'm likely to use, but only if they need moretweaking than the kerning options of the font format allow. I'll omitthe history lesson on how this was similar in the case of movable type.

Keyboards for different languages mean that a character that is a singlekeystroke in one case is two together or in sequence in another. Thismeans that Unicode not only represents completed strings, but also thosethat are mid composition. The ordering that it uses to ensure thatgraphemes have a single canonical representation is based on the orderthat those multi-key characters are entered. I wouldn't call it elegant,but its not inelegant either.

Trying to represent all sufficiently similar glyphs with the samecodepoint would lead to a layout problem. How would you order them sothat strings of any language can be sorted by their local sorting rules,without having to special case algorithms?

Also consider ligatures, such as those for "ff", "fi", "ffi", "fl","ffl" and many, many more. Typographers create these glyphs wheneveravailable kerning tools do a poor job of combining them from theindividual glyphs. From the point of view of meaning they should stillbe represented as individual codepoints, but for display (electronic orprint) that sequence needs to be replaced with the single codepoint forthe ligature.

I think that in order to understand the decisions of the Unicodecommittee, one has to consider that they are trying to unify theconcerns of representing written information from two sides. One sideprioritises storage and manipulation, while the other considersaesthetics and design workflow more important. My experience of usingUnicode from both sides gives me a different appreciation for thedifficulties of reconciling the two.


A...

P.S.

Then they started adding emojis, and I lost all faith in humanity ;)

Re: The Case Against Autodecode

Reply via email to