On 6/4/2015 17:03 , "Chris" wrote:
This whole discussion is about the fact that it would be technically possible to have private character sets and private agreements that your OS downloads without the user being aware of it.

The sticky issues are not the questions of how to make available fonts or images for use by the OS.

Instead, they concern the fact that any such a model violates some pretty basic guarantees of plain text that the entire net infrastructure relies on.

There are very obvious security issues. The start with tracking; every time you access a custom code point, that fact potentially results in a trackable interaction. This problem affects even the "sticker" solution that people are hoping for for emoji. (On my system, no external resources are displayed when I first open any message, and there is a reason for that).

Beyond tracking, and beyond stickers (that is pictures that look like pictures) a generalized custom character set would allow "text" that is no longer really stable. You would be able to deliver identical e-mails to people that display differently, because when you serve the custom fonts, you would be able to customize what you deliver under the same custom character set designator.

While this would be a wonderful way to circumvent censorship (other than the "man in the middle" version), you would likewise seriously undermine the ability to filter unwanted or undesirable texts, because the custom character set engine might recognize when a request comes from a filter and not the end user. (Just the other day, I came across a hacked website that responded differently to search engined than to live users, making the hack effective for one and invisible to the other. Custom character sets would seem to just add to the hackers' arsenal here).

Finally, custom character sets sound like a great idea when thinking of an extension of an existing character set. But that's not where the issues are. The issues come in when you use the same technology to provide aliases for existing code points or for other custom characters.

Aliasing undermines the ability to do search (or any other content-focused processing, from sorting to spell-check).

At that point, the circle closes.

When Unicode was created, the alternative then was ISO 2022, which was a standard that addressed the issue of how to switch among (albeit pre-defined) character sets to achieve, in principle, coverage equal to the union of these character sets.

Unicode was created to address two main deficiencies of that situation. Unification addressed the aliasing issue, so that code points were no longer "opaque" but could be interpreted by software (other than display), which was the second big drawback of the patchwork of character sets. A processing model for opaque code points is possible to define, but it isn't very practical and in the late eighties people had had enough were glad to be quit of it.

Seen from this perspective, the discussion about custom character sets presents itself as a giant step backward, undermining the very advances that underlie the rapid acceptance and spread of Unicode.

A./

Reply via email to