Re: [OT] Effect of UTF-8 on 2G connections

Joakim via Digitalmars-d Wed, 01 Jun 2016 23:52:50 -0700

On Wednesday, 1 June 2016 at 18:30:25 UTC, Wyatt wrote:

On Wednesday, 1 June 2016 at 16:45:04 UTC, Joakim wrote:
On Wednesday, 1 June 2016 at 15:02:33 UTC, Wyatt wrote:
It's not hard. I think a lot of us remember when a 14.4modem was cutting-edge.
Well, then apparently you're unaware of how bloated web pagesare nowadays. It used to take me minutes to download popularweb pages _back then_ at _top speed_, and those pages were a_lot_ smaller.
It's telling that you think the encoding of the text isanything but the tiniest fraction of the problem. You shouldlook at where the actual weight of a "modern" web page comesfrom.

I'm well aware that text is a small part of it. My point is thatthey're not downloading those web pages, they're using mobileinstead, as I explicitly said in a prior post. My only point inmentioning the web bloat to you is that _your perception_ is offbecause you seem to think they're downloading _current_ web pagesover 2G connections, and comparing it to your downloads of _past_web pages with modems. Not only did it take minutes for us backthen, it takes _even longer_ now.

I know the text encoding won't help much with that. Where itwill help is the mobile apps they're actually using, not thebloated websites they don't use.

Codepages and incompatible encodings were terrible then, too.

Never again.
This only shows you probably don't know the difference betweenan encoding and a code page,
"I suggested a single-byte encoding for most languages, withdouble-byte for the ones which wouldn't fit in a byte. Use somekind of header or other metadata to combine strings ofdifferent languages, _rather than encoding the language intoevery character!_"
Yeah, that? That's codepages. And your exact proposal to putencodings in the header was ALSO tried around the time thatUnicode was getting hashed out. It sucked. A lot. (Not asbad as storing it in the directory metadata, though.)

You know what's also codepages? Unicode. The UCS is astandardized set of code pages for each language, often merelypicking the most popular code page at that time.

I don't doubt that nothing I'm saying hasn't been tried in someform before. The question is whether that alternate form wouldbe better if designed and implemented properly, not if a botcheddesign/implementation has ever been attempted.

Well, when you _like_ a ludicrous encoding like UTF-8, notsure your opinion matters.
It _is_ kind of ludicrous, isn't it? But it really is theleast-bad option for the most text. Sorry, bub.
I think we can do a lot better.
Maybe.  But no one's done it yet.

That's what people said about mobile devices for a long time,until about a decade ago. It's time we got this right.

The vast majority of software is written for _one_ language,the local one. You may think otherwise because the softwarethat sells the most and makes the most money isinternationalized software like Windows or iOS, because it canbe resold into many markets. But as a percentage of lines ofcode written, such international code is almost nothing.
I'm surprised you think this even matters after talking aboutweb pages. The browser is your most common string processingsituation. Nothing else even comes close.

No, it's certainly popular software, but at the scale we'retalking about, ie all string processing in all software, it'sfairly small. And the vast majority of webapps that handlestrings passed from a browser are written to only handle onelanguage, the local one.

largely ignoring the possibilities of the header scheme Isuggested.
"Possibilities" that were considered and discarded decades agoby people with way better credentials. The era of single-byteencodings is gone, it won't come back, and good riddance to badrubbish.

Lol, credentials. :D If you think that matters at all in the faceof the blatant stupidity embodied by UTF-8, I don't know what totell you.

I could call that "trolling" by all of you, :) but I'llinstead call it what it likely is, reactionary thinking, andmove on.
It's not trolling to call you out for clearly not doing yourhomework.

That's funny, because it's precisely you and others who haven'tdone your homework. So are you all trolling me? By yourdefinition of trolling, which btw is not the standard one, _you_are the one doing it.

I don't think you understand: _you_ are the special case.
Oh, I understand perfectly. _We_ (whoever "we" are) can handleany sequence of glyphs and combining characters(correctly-formed or not) in any language at any time, so we'rethe special case...?

And you're doing so by mostly using a single-byte encoding for_your own_ Euro-centric languages, ie ASCII, while imposingunnecessary double-byte and triple-byte encodings on everyoneelse, despite their outnumbering you 10 to 1. That is the verydefinition of a special case.

Yeah, it sounds funny to me, too.

I'm happy to hear you find your privilege "funny," but I'm sorryto tell you, it won't last.

The 5 billion people outside the US and EU are _not thespecial case_.
Fortunately, it works for them to.


At a higher and unneccessary cost, which is why it won't last.

The problem is all the rest, and those just below who cannotafford it at all, in part because the tech is not as efficientas it could be yet. Ditching UTF-8 will be one way to make itmore efficient.
All right, now you've found the special case; the case wherethe generic, unambiguous encoding may need to be lowered tosomething else: people for whom that encoding is suboptimalbecause of _current_ network constraints.
I fully acknowledge it's a couple billion people and that'snothing to sneeze at, but I also see that it's a situation thatwill become less relevant over time.

I continue to marvel at your calling a couple billion people "thespecial case," presumably thinking ~700 million people in the USand EU primarily using the single-byte encoding of ASCII are thegeneral case.

As for the continued relevance of such constrained use, I suggestyou read the link Marco provided above. The vast majority of theworlwide literate population doesn't have a smartphone or use acellular data plan, whereas the opposite is true if you includefeaturephones, largely because they can by used only for voice.As that article notes, costs for smartphones and 2G data planswill have to come down for them to go wider. That will takedecades to roll out, though the basic tech design will mostly bedone now.

The costs will go down by making the tech more efficient, andditching UTF-8 will be one of the ways the tech will be made moreefficient.

Re: [OT] Effect of UTF-8 on 2G connections

Reply via email to