On 11/30/2017 9:23 AM, Kagamin wrote:
On Tuesday, 28 November 2017 at 03:37:26 UTC, rikki cattermole wrote:
Be aware Microsoft is alone in thinking that UTF-16 was awesome. Everybody else standardized on UTF-8 for Unicode.

UCS2 was awesome. UTF-16 is used by Java, JavaScript, Objective-C, Swift, Dart and ms tech, which is 28% of tiobe index.

"was" :-) Those are pretty much pre-surrogate pair designs, or based on them (Dart compiles to JavaScript, for example).

UCS2 has serious problems:

1. Most strings are in ascii, meaning UCS2 doubles memory consumption. Strings in the executable file are twice the size.

2. The code doesn't work well with C. C doesn't even have a UCS2 type.

3. There's no reasonable way to audit the code to see if it handles surrogate pairs correctly. Surrogate pairs occur only rarely, so the code is never tested for it, and the bugs may remain latent for many, many years.

With UTF8, multibyte code points are much more common, so bugs are detected much earlier.

Reply via email to