On Thursday, November 30, 2017 03:37:37 Walter Bright via Digitalmars-d wrote: > On 11/30/2017 2:39 AM, Joakim wrote: > > Java, .NET, Qt, Javascript, and a handful of others use UTF-16 too, some > > starting off with the earlier UCS-2: > > > > https://en.m.wikipedia.org/wiki/UTF-16#Usage > > > > Not saying either is better, each has their flaws, just pointing out > > it's more than just Windows. > > I stand corrected.
I get the impression that the stuff that uses UTF-16 is mostly stuff that picked an encoding early on in the Unicode game and thought that they picked one that guaranteed that a code unit would be an entire character. Many of them picked UCS-2 and then switched later to UTF-16, but once they picked a 16-bit encoding, they were kind of stuck. Others - most notably C/C++ and the *nix world - picked UTF-8 for backwards compatibility, and once it became clear that UCS-2 / UTF-16 wasn't going to cut it for a code unit representing a character, most stuff that went Unicode went UTF-8. Language-wise, I think that most of the UTF-16 is driven by the fact that Java went with UCS-2 / UTF-16, and C# followed them (both because they were copying Java and because the Win32 API had gone with UCS-2 / UTF-16). So, that's had a lot of influence on folks, though most others have gone with UTF-8 for backwards compatibility and because it typically takes up less space for non-Asian text. But the use of UTF-16 in Windows, Java, and C# does seem to have resulted in some folks thinking that wide characters means Unicode, and narrow characters meaning ASCII. I really wish that everything would just got to UTF-8 and that UTF-16 would die, but that would just break too much code. And if we were willing to do that, I'm sure that we could come up with a better encoding than UTF-8 (e.g. getting rid of Unicode normalization as being a thing and never having multiple encodings for the same character), but _that_'s never going to happen. - Jonathan M Davis
