On Wednesday, 1 June 2016 at 18:30:25 UTC, Wyatt wrote:
On Wednesday, 1 June 2016 at 16:45:04 UTC, Joakim wrote:
On Wednesday, 1 June 2016 at 15:02:33 UTC, Wyatt wrote:
It's not hard. I think a lot of us remember when a 14.4 modem was cutting-edge.

Well, then apparently you're unaware of how bloated web pages are nowadays. It used to take me minutes to download popular web pages _back then_ at _top speed_, and those pages were a _lot_ smaller.

It's telling that you think the encoding of the text is anything but the tiniest fraction of the problem. You should look at where the actual weight of a "modern" web page comes from.

I'm well aware that text is a small part of it. My point is that they're not downloading those web pages, they're using mobile instead, as I explicitly said in a prior post. My only point in mentioning the web bloat to you is that _your perception_ is off because you seem to think they're downloading _current_ web pages over 2G connections, and comparing it to your downloads of _past_ web pages with modems. Not only did it take minutes for us back then, it takes _even longer_ now.

I know the text encoding won't help much with that. Where it will help is the mobile apps they're actually using, not the bloated websites they don't use.

Codepages and incompatible encodings were terrible then, too.

Never again.

This only shows you probably don't know the difference between an encoding and a code page,

"I suggested a single-byte encoding for most languages, with double-byte for the ones which wouldn't fit in a byte. Use some kind of header or other metadata to combine strings of different languages, _rather than encoding the language into every character!_"

Yeah, that? That's codepages. And your exact proposal to put encodings in the header was ALSO tried around the time that Unicode was getting hashed out. It sucked. A lot. (Not as bad as storing it in the directory metadata, though.)

You know what's also codepages? Unicode. The UCS is a standardized set of code pages for each language, often merely picking the most popular code page at that time.

I don't doubt that nothing I'm saying hasn't been tried in some form before. The question is whether that alternate form would be better if designed and implemented properly, not if a botched design/implementation has ever been attempted.

Well, when you _like_ a ludicrous encoding like UTF-8, not sure your opinion matters.

It _is_ kind of ludicrous, isn't it? But it really is the least-bad option for the most text. Sorry, bub.

I think we can do a lot better.

Maybe.  But no one's done it yet.

That's what people said about mobile devices for a long time, until about a decade ago. It's time we got this right.

The vast majority of software is written for _one_ language, the local one. You may think otherwise because the software that sells the most and makes the most money is internationalized software like Windows or iOS, because it can be resold into many markets. But as a percentage of lines of code written, such international code is almost nothing.

I'm surprised you think this even matters after talking about web pages. The browser is your most common string processing situation. Nothing else even comes close.

No, it's certainly popular software, but at the scale we're talking about, ie all string processing in all software, it's fairly small. And the vast majority of webapps that handle strings passed from a browser are written to only handle one language, the local one.

largely ignoring the possibilities of the header scheme I suggested.

"Possibilities" that were considered and discarded decades ago by people with way better credentials. The era of single-byte encodings is gone, it won't come back, and good riddance to bad rubbish.

Lol, credentials. :D If you think that matters at all in the face of the blatant stupidity embodied by UTF-8, I don't know what to tell you.

I could call that "trolling" by all of you, :) but I'll instead call it what it likely is, reactionary thinking, and move on.

It's not trolling to call you out for clearly not doing your homework.

That's funny, because it's precisely you and others who haven't done your homework. So are you all trolling me? By your definition of trolling, which btw is not the standard one, _you_ are the one doing it.

I don't think you understand: _you_ are the special case.

Oh, I understand perfectly. _We_ (whoever "we" are) can handle any sequence of glyphs and combining characters (correctly-formed or not) in any language at any time, so we're the special case...?

And you're doing so by mostly using a single-byte encoding for _your own_ Euro-centric languages, ie ASCII, while imposing unnecessary double-byte and triple-byte encodings on everyone else, despite their outnumbering you 10 to 1. That is the very definition of a special case.

Yeah, it sounds funny to me, too.

I'm happy to hear you find your privilege "funny," but I'm sorry to tell you, it won't last.

The 5 billion people outside the US and EU are _not the special case_.

Fortunately, it works for them to.

At a higher and unneccessary cost, which is why it won't last.

The problem is all the rest, and those just below who cannot afford it at all, in part because the tech is not as efficient as it could be yet. Ditching UTF-8 will be one way to make it more efficient.

All right, now you've found the special case; the case where the generic, unambiguous encoding may need to be lowered to something else: people for whom that encoding is suboptimal because of _current_ network constraints.

I fully acknowledge it's a couple billion people and that's nothing to sneeze at, but I also see that it's a situation that will become less relevant over time.

I continue to marvel at your calling a couple billion people "the special case," presumably thinking ~700 million people in the US and EU primarily using the single-byte encoding of ASCII are the general case.

As for the continued relevance of such constrained use, I suggest you read the link Marco provided above. The vast majority of the worlwide literate population doesn't have a smartphone or use a cellular data plan, whereas the opposite is true if you include featurephones, largely because they can by used only for voice. As that article notes, costs for smartphones and 2G data plans will have to come down for them to go wider. That will take decades to roll out, though the basic tech design will mostly be done now.

The costs will go down by making the tech more efficient, and ditching UTF-8 will be one of the ways the tech will be made more efficient.

Reply via email to