On Wednesday, 1 June 2016 at 10:04:42 UTC, Marc Schütz wrote:
On Tuesday, 31 May 2016 at 16:29:33 UTC, Joakim wrote:
UTF-8 is an antiquated hack that needs to be eradicated. It
forces all other languages than English to be twice as long,
for no good reason, have fun with that when you're downloading
text on a 2G connection in the developing world.
I assume you're talking about the web here. In this case, plain
text makes up only a minor part of the entire traffic, the
majority of which is images (binary data), javascript and
stylesheets (almost pure ASCII), and HTML markup (ditto). It's
like not significant even without taking compression into
account, which is ubiquitous.
No, I explicitly said not the web in a subsequent post. The
ignorance here of what 2G speeds are like is mind-boggling.
It is unnecessarily inefficient, which is precisely why
auto-decoding is a problem.
No, inefficiency is the least of the problems with
auto-decoding.
Right... that's why this 200-post thread was spawned with that as
the main reason.
It is only a matter of time till UTF-8 is ditched.
This is ridiculous, even if your other claims were true.
The UTF-8 encoding is what's ridiculous.
D devs should lead the way in getting rid of the UTF-8
encoding, not bickering about how to make it more palatable.
I suggested a single-byte encoding for most languages, with
double-byte for the ones which wouldn't fit in a byte. Use
some kind of header or other metadata to combine strings of
different languages, _rather than encoding the language into
every character!_
I think I remember that post, and - sorry to be so blunt - it
was one of the worst things I've ever seen proposed regarding
text encoding.
Well, when you _like_ a ludicrous encoding like UTF-8, not sure
your opinion matters.
The common string-handling use case, by far, is strings with
only one language, with a distant second some substrings in a
second language, yet here we are putting the overhead into
every character to allow inserting characters from an
arbitrary language! This is madness.
No. The common string-handling use case is code that is unaware
which script (not language, btw) your text is in.
Lol, this may be the dumbest argument put forth yet.
I don't think anyone here even understands what a good encoding
is and what it's for, which is why there's no point in debating
this.