Re: First Impressions!

Patrick Schluter via Digitalmars-d Sat, 02 Dec 2017 02:40:54 -0800

On Friday, 1 December 2017 at 23:16:45 UTC, H. S. Teoh wrote:

On Fri, Dec 01, 2017 at 03:04:44PM -0800, Walter Bright viaDigitalmars-d wrote:
On 11/30/2017 9:23 AM, Kagamin wrote:
> On Tuesday, 28 November 2017 at 03:37:26 UTC, rikki> cattermole wrote:> > Be aware Microsoft is alone in thinking that UTF-16 was> > awesome. Everybody else standardized on UTF-8 for Unicode.>> UCS2 was awesome. UTF-16 is used by Java, JavaScript,> Objective-C, Swift, Dart and ms tech, which is 28% of tiobe> index.
"was" :-) Those are pretty much pre-surrogate pair designs, orbased
on them (Dart compiles to JavaScript, for example).

UCS2 has serious problems:
1. Most strings are in ascii, meaning UCS2 doubles memoryconsumption. Strings in the executable file are twice the size.
This is not true in Asia, esp. where the CJK block isextensively used. A CJK block character is 3 bytes in UTF-8,meaning that string sizes are 150% of the UCS2 encoding. Ifyour code contains a lot of CJK text, that's a lot of bloat.

That's true in theory, in practice it's not that severe as theCJK languages are never isolated and appear embedded in a lot ofASCII. You can read here a case study [1] which shows 106% forSimplified Chinese, 76% for Traditional Chinese, 129% forJapanese and 94% for Korean. These numbers for pure text. Publishit on the web embedded in bloated html and there goes the sizeadvantage of UTF-16

But then again, in non-Latin locales you'd generally store yourstrings separately of the executable (usually in l10n files),so this may not be that big an issue. But the blanket statement"Most strings are in ASCII" is not correct.

False, in the sense that isolated pure text is rare and isgenerally delivered inside some file format, most times ASCIIbased like docx, odf, tmx, xliff, akoma ntoso etc...

[1]:https://stackoverflow.com/questions/6883434/at-all-times-text-encoded-in-utf-8-will-never-give-us-more-than-a-50-file-size

Re: First Impressions!

Reply via email to