Timon Gehr wrote:
Me too. I think the way we have it now is optimal. The only reason we
are discussing this is because of fear that uneducated users will write
code that does not take into account Unicode characters above code point
0x80.
+1
From D's string docs:
"char[] strings are in UTF-8 format. wchar[] strings are in UTF-16
format. dchar[] strings are in UTF-32 format."
I would additionally add some clarifications:
char[] is an array of 8-bit code units. Unicode code point may take up
to 4 chars.
wchar[] is an array of 16-bit code units. Unicode code point may take up
to 2 wchars.
dchar[] is an array of 32-bit code units. Unicode code point always fits
into one dchar.
Each of these formats may encode any Unicode string.
If you need indexing or slicing use:
* char[] or string when working with ASCII code points.
* wchar[] or wstring when working with Basic Multilingual Plane (BMP)
code points.
* dchar[] or dstring when working with all possible code points.
If you do not need indexing or slicing you may use any of the formats.