Re: The Case Against Autodecode

Marc Schütz via Digitalmars-d Wed, 01 Jun 2016 03:07:48 -0700

On Tuesday, 31 May 2016 at 16:29:33 UTC, Joakim wrote:

UTF-8 is an antiquated hack that needs to be eradicated. Itforces all other languages than English to be twice as long,for no good reason, have fun with that when you're downloadingtext on a 2G connection in the developing world.

I assume you're talking about the web here. In this case, plaintext makes up only a minor part of the entire traffic, themajority of which is images (binary data), javascript andstylesheets (almost pure ASCII), and HTML markup (ditto). It'slike not significant even without taking compression intoaccount, which is ubiquitous.

It is unnecessarily inefficient, which is precisely whyauto-decoding is a problem.


No, inefficiency is the least of the problems with auto-decoding.

It is only a matter of time till UTF-8 is ditched.


This is ridiculous, even if your other claims were true.

D devs should lead the way in getting rid of the UTF-8encoding, not bickering about how to make it more palatable. Isuggested a single-byte encoding for most languages, withdouble-byte for the ones which wouldn't fit in a byte. Usesome kind of header or other metadata to combine strings ofdifferent languages, _rather than encoding the language intoevery character!_

I think I remember that post, and - sorry to be so blunt - it wasone of the worst things I've ever seen proposed regarding textencoding.

The common string-handling use case, by far, is strings withonly one language, with a distant second some substrings in asecond language, yet here we are putting the overhead intoevery character to allow inserting characters from an arbitrarylanguage! This is madness.

No. The common string-handling use case is code that is unawarewhich script (not language, btw) your text is in.

Re: The Case Against Autodecode

Reply via email to