On Thursday, 30 November 2017 at 10:19:18 UTC, Walter Bright wrote:
On 11/27/2017 7:01 PM, A Guy With an Opinion wrote:
+- Unicode support is good. Although I think D's string type should have probably been utf16 by default. Especially considering the utf module states:

"UTF character support is restricted to '\u0000' <= character <= '\U0010FFFF'."

Seems like the natural fit for me. Plus for the vast majority of use cases I am pretty guaranteed a char = codepoint. Not the biggest issue in the world and maybe I'm just being overly critical here.

Sooner or later your code will exhibit bugs if it assumes that char==codepoint with UTF16, because of surrogate pairs.

https://stackoverflow.com/questions/5903008/what-is-a-surrogate-pair-in-java

As far as I can tell, pretty much the only users of UTF16 are Windows programs. Everyone else uses UTF8 or UCS32.

I recommend using UTF8.

As long as you understand it's limitations I think most bugs can be avoided. Where UTF16 breaks down, is pretty well defined. Also, super rare. I think UTF32 would be great to, but it seems like just a waste of space 99% of the time. UTF8 isn't horrible, I am not going to never use D because it uses UTF8 (that would be silly). Especially when wstring also seems baked into the language. However, it can complicate code because you pretty much always have to assume character != codepoint outside of ASCII. I can see a reasonable person arguing that it forcing you assume character != code point is actually a good thing. And that is a valid opinion.

Reply via email to