Re: First Impressions!

Patrick Schluter via Digitalmars-d Sun, 03 Dec 2017 04:40:44 -0800

On Saturday, 2 December 2017 at 22:16:09 UTC, Joakim wrote:

On Friday, 1 December 2017 at 23:16:45 UTC, H. S. Teoh wrote:
On Fri, Dec 01, 2017 at 03:04:44PM -0800, Walter Bright viaDigitalmars-d wrote:
On 11/30/2017 9:23 AM, Kagamin wrote:
> On Tuesday, 28 November 2017 at 03:37:26 UTC, rikki> cattermole wrote:> > Be aware Microsoft is alone in thinking that UTF-16 was> > awesome. Everybody else standardized on UTF-8 for Unicode.>> UCS2 was awesome. UTF-16 is used by Java, JavaScript,> Objective-C, Swift, Dart and ms tech, which is 28% of tiobe> index.
"was" :-) Those are pretty much pre-surrogate pair designs,or based
on them (Dart compiles to JavaScript, for example).

UCS2 has serious problems:
1. Most strings are in ascii, meaning UCS2 doubles memoryconsumption. Strings in the executable file are twice thesize.
This is not true in Asia, esp. where the CJK block isextensively used. A CJK block character is 3 bytes in UTF-8,meaning that string sizes are 150% of the UCS2 encoding. Ifyour code contains a lot of CJK text, that's a lot of bloat.
Yep, that's why five years back many of the major Chinese siteswere still not using UTF-8:
http://xahlee.info/w/what_encoding_do_chinese_websites_use.html


Summary

Taiwan sites almost all use UTF-8. Very old ones still use BIG5.

Mainland China sites mostly still use GBK or GB2312, but a fewnewer ones use UTF-8.

Many top Japan, Korea, sites also use UTF-8, but some uses EUC(Extended Unix Code) variants.


This probably means that UTF-8 might dominate in the future.

mmmh

That led that Chinese guy to also rant against UTF-8 a coupleyears ago:
http://xahlee.info/comp/unicode_utf8_encoding_propaganda.html

A rant from someone reproaching a video it doesn't providereasons why utf-8 is good by not providing any reasons why utf-8is bad. I'm not denying the issues with utf-8, only that theranter doesn't provide any useful info on what the issues the"Asian" encounter with it, besides legacy reasons (which areimportant but do not enter in judging the technical quality of anencoding).Add to that that he advocates for GB18030 which is quite inferiorto utf-8 except in the legacy support area (here some of theadvantages of utf-8 that GB-18030 does not possess:auto-synchronization, algorithmic mapping of codepoints, errordetection).If his only beef with utf-8 is the size for CJK text then heshouldn't argue for UTF-32 as he seems to do at the end.

Re: First Impressions!

Reply via email to