On 2015/06/03 07:55, Chris wrote:

As you point out, "The UCS will not encode characters without a demonstrated 
usage.”. But there are use cases for characters that don’t meet UCS’s criteria for a 
world wide standard, but are necessary for more specific use cases, like specialised 
regional, business, or domain specific situations.

Unicode contains *a lot* of characters for specialized regional, business, or domain specific situations.

My question is, given that unicode can’t realistically (and doesn’t aim to) 
encode every possible symbol in the world, why shouldn’t there be an EXTENSIBLE 
method for encoding, so that people don’t have to totally rearchitect their 
computing universe because they want ONE non-standard character in their 
documents?

As has been explained, there are technologies that allow you to do (more or less) that. Information technology, like many other technologies, works best when finding common cases used by many people. Let's look at some examples:

Character encodings work best when they are used widely and uniformly. I don't know anybody who actually uses all the characters in Unicode (except the guys that work on the standard itself). So for each individual, a smaller set would be okay. And there were (and are) smaller sets, not for individuals, but for countries, regions, scripts, and so on. Originally (when memory was very limited), these legacy encodings were more efficient overall, but that's no longer the case. So everything is moving towards Unicode.

Most Website creators don't use all the features in HTML5. So having different subsets for different use cases may seem to be convenient. But overall, it's much more efficient to have one Hypertext Markup Language, so that's were everybody is converging to.

From your viewpoint, it looks like having something in between character encodings and HTML is what you want. It would only contain the features you need, and nothing more, and would work in all the places you wanted it to work. Asmus's "inline" text may be something similar.

The problem is that such an intermediate technology only makes sense if it covers the needs of lots and lots of people. It would add a third technology level (between plain text and marked-up text), which would divert energy from the current two levels and make things more complicated.

Up to now, such as third level hasn't emerged, among else because both existing technologies were good at absorbing the most important use cases from the middle. Unicode continues to encode whatever symbols that gain reasonable popularity, so every time somebody has a "real good use case" for the middle layer with a symbol that isn't yet in Unicode, that use case gets taken away. HTML (or Web technology in general) also worked to improve the situation, with technologies such as SVG and Web Fonts.

No technology is perfect, and so there are still some gaps between character encoding and markup, some of which may in due time eventually be filled up, but I don't think a third layer in the middle will emerge soon.

Regards,   Martin.

Reply via email to