Re: The Case Against Autodecode

Joakim via Digitalmars-d Wed, 01 Jun 2016 07:02:09 -0700

On Wednesday, 1 June 2016 at 10:04:42 UTC, Marc Schütz wrote:

On Tuesday, 31 May 2016 at 16:29:33 UTC, Joakim wrote:
UTF-8 is an antiquated hack that needs to be eradicated. Itforces all other languages than English to be twice as long,for no good reason, have fun with that when you're downloadingtext on a 2G connection in the developing world.
I assume you're talking about the web here. In this case, plaintext makes up only a minor part of the entire traffic, themajority of which is images (binary data), javascript andstylesheets (almost pure ASCII), and HTML markup (ditto). It'slike not significant even without taking compression intoaccount, which is ubiquitous.

No, I explicitly said not the web in a subsequent post. Theignorance here of what 2G speeds are like is mind-boggling.

It is unnecessarily inefficient, which is precisely whyauto-decoding is a problem.
No, inefficiency is the least of the problems withauto-decoding.

Right... that's why this 200-post thread was spawned with that asthe main reason.

It is only a matter of time till UTF-8 is ditched.


This is ridiculous, even if your other claims were true.


The UTF-8 encoding is what's ridiculous.

D devs should lead the way in getting rid of the UTF-8encoding, not bickering about how to make it more palatable.I suggested a single-byte encoding for most languages, withdouble-byte for the ones which wouldn't fit in a byte. Usesome kind of header or other metadata to combine strings ofdifferent languages, _rather than encoding the language intoevery character!_
I think I remember that post, and - sorry to be so blunt - itwas one of the worst things I've ever seen proposed regardingtext encoding.

Well, when you _like_ a ludicrous encoding like UTF-8, not sureyour opinion matters.

The common string-handling use case, by far, is strings withonly one language, with a distant second some substrings in asecond language, yet here we are putting the overhead intoevery character to allow inserting characters from anarbitrary language! This is madness.
No. The common string-handling use case is code that is unawarewhich script (not language, btw) your text is in.


Lol, this may be the dumbest argument put forth yet.

I don't think anyone here even understands what a good encodingis and what it's for, which is why there's no point in debatingthis.

Re: The Case Against Autodecode

Reply via email to